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FOREWORD 


The  papers  presented  at  the  Twentieth  Annual  Conference  of  the 
Military  Testing  Association  came  from  the  business,  educational,  and 
military  communities,  both  foreign  and  domestic.    The  papers  reflect 
the  opinions  of  their  authors  only  and  are  not  to  be  construed  as  the 
official  policy  of  any  institution,  government,  or  branch  of  armed 
service. 
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MTA  KEYNOTE  ADDRESS; 
QUALITY  OF  LIFE 
RADM  W.  H.  STEWART,  USCG 
Chief,  Office  of  Personnel 


Thank  you  Capt.  Ferguson. 

On  behalf  of  the  Coast  Guard,  I  want  to  add  my  personal  welcome  to  each 
of  you  to  the  twentieth  annual  MTA  conference.    This  conference  will  cap  two 
decades  of  effort  to  exchange  technical  information  and  know-how  in  the  personnel 
management^area.    For  twenty  years  many  of  you  have  made  special  efforts  to 
present  sCiiCiarly  papers.    For  twenty  years,  each  of  the  services  has  made 
special  efforts  to  host  this  conference.    That  the  conferences  have  continued 
for  twenty  years  is  a  testimony  to  their  worth.    That  your  membership  and 
attendance  now  includes  representatives  from  the  academic  communities,  from 
other  government  agencies,  from  private  industry,  and  from  military  services 
of  other  countries  is  also  an  indication  that  much  of  the  information  you  seek 
to  exchange  is  of  a  broad  and  possibly  universal  nature. 

At  this  time,  I  would  like  to  welcome  in  particular  Colonel  Seuberlich 
and  Dr.  Puzicha  from  Germany  and  also  Squadron  Leader  Thompson  from  Australia, 
as  well  as  Mr.  Beel  from  the  Royal  Navy,  and  Colonel  Leach  from  Canada.  I 
understand  that  Canada  has  volunteered  to  host  this  convention  at  Toronto  in 
1980.    I  also  want  to  recognize  Mr.  Foley  of  the  Navy  Personnel  Research  and 
Development  Center  who  will  host  this  convention  in  San  Diego  next  year.  I 
wish  to  welcome  the  participants  from  the  Universities,  from  private  industry, 
and  from  other  government  agencies.    Also,  I  want  to  acknowledge  the  presence 
of  the  commanding  officers  of  our  Coast  Guard  training  units  and  their  staffs 
who  have  been  attending  the  Commanding  Officsr/Training  Officer  Conference. 

I  have  reviewed  the  procssdings  of  your  last  three  meetings  and,  though 
I  am  not  scientifically  qualified  to  judge  the  merits  of  your  papers,  I  can 
say,  as  a  qualified  layman,  that  you  generate  a  considerable  amount  of  material. 
Considering  the  volume,  complexity,  and  specificity  of  your  output,  I'm  not 
certain  whether  I  admire  most  the  people  who  are  delivering  this  information 
or  the  ones  who  are  receiving  and  understanding  it.    In  any  case,  it  i-;  not 
hard  to  understand  why  you  always  have  a  full  schedule. 

^  As  Chief  of  Personnel  of  the  Coast  Guard,  I  am  very  concerned  with  utility, 
efficiency,  and  productivity;  for  these  are  the  measures  of  individual  and 
organizational  performance.    As  Chief  of  Personnel,  I  also  wonder  if  the  Coast 
Guard  men  and  women  of  today  can  handle  the  Coast  Guard  of  tomorrow.    Most  of 
the  instruments  and  procedures  that  you  develop  are  designed  to  help  answer 
questions  of  this  sort  and  to  improve  the  efficiency  or  productivity  of  the 
organization  supporting  your  research.    However,  such  improvements  may  or  may 
not  benefit  the  individuals  who  are  being  managed.    Almost  always,  the  concept 
of  utility  ignores  the  individuals  in  the  organization  because  the  utility  is 
designed  to  benefit  the  organization.    This,  or  course,  is  good  for  the  organi- 
zation and  what  is  good  for  the  organization  generally  returns  benefits  to  th^^ 
individuals  in  the  organization.    But,  I  have  seen  ^.ome  great  exceptions.  We^V 


n 
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have  ketDt  ships  at  sea  too  long;  and  we  have  permitted  long  work  days,  and 

iTrliTr.  "'n^'-  V^?"  ^^^'^      ^  "^ed  Which  requires  personal 

InZ  lllV  "'^^  complain.    But  we  cannot  justify  working  people  16 

L'f  nf'Tf^^  because  there  is  utility  to  it.    The  point  is  that  we  must 
5h/?nS?w?L^5'  'n'^         balance  between  benefits  to  the  organization  and  to 
i-.%n  w^ic  ''•^  of  benefiting  both  the  individual  and  the  organization 

IS  to  increase  professionalism  at  all  personnel  levels.    This  is  the  major  goal 
of  the  Commandant  of  the  Coast  Guard.    That  is,  we  will  encourage  and  assist 
professional  development  of  benefit  to  the  individual  and  of  value  to  the 
organization.  vaiuc  ou  one 

ri^^    ^-  balance  between  individual  and  organizational  benefits  which 

determines  the  quality  of  life.    This  balance  must  continually  be  reestablished 
In  n^n"-^''°2u  ^"^"9'=':         example,  the  Coast  Guard  has  just  approved  the  policy 
to  provide  the  opportunity  for  women  to  serve  in  all  billets  on  board  all  ships 
nJ?nnff ^^I  ^""^.^^        '^^"''5'  including  the  billet  of  commanding 

•  restriction  is  that  adequate  personal  privacy  can  be  provided. 

This  decision  was  not  the  product  of  an  organizational  utility  model.    For  one 
it^LT..    It  ^l^'^y^.^^^l  that  the  Coast  Guard  has  done  well  even  without  women, 
merefore,  the  decision  to  open  opportunities  for  women  was  based  primarily 

cnr^.T    !-?*'°"u-°^^''"'?^  ^""^  justice.    This  decision  is  the  product  of  those 
soc  al,  philosophical,  political,  and  legal  forces  which  are  continuously 
evolving  and  changing  our  society.    So,  with  one  value  judgment,  a  huge  change 
has  been  introduced  into  the  Coast  Guard.  »  a 

UiP  ?n^n!       S^f '^^^  ^^^"9e  was,  to  make  greater  opportunities  avail- 

able to  women     But,  as  I  indicated  before,  I  am  also  concerned  with  both 
individual  and  organizational  performance;  and  certainly  there  is  no  intention 
or  putting  women  (or  men,  for  that  matter)  into  positions  where  they  are  not 
qualified  or  where  they  cannot  perform  adequately.    Not  only  would  such  assign- 
ment be  unfair  to  the  individual  woman  (or  man),  but  also  it  would  reduce 
ffMni^^?  levels.    We  all  know  that  the  ability  to  perform  is 

.P^wS  in  .       ^ndividua    aptitude,  training  and  motivation.    If  a  woman  wants  to 
serve  in  a  previously  all  male  billet  or  job,  then  she  has  (by  definition) 
adequate  motivation  to  perform.    But,  she  may  not  be  qualified  because  of  lack 
Of  training  or  experience  even  though  she  has  the  aptitude.    This  is  also  true 
tor  most  male  recruits  we  take  into  the  Coast  Guard. 

The  question  has  been,  and  still  is,  who  can  best  be  trained,  that  is,  who 
hds  tne  aptitude  tor  training?    It  is  in  this  area  that  your  classification 
tests  have  made  a  valuable  contribution.    But,  do  these  tests  work  equally  well 
for  women  as  for  men?    Our  mechanical  aptitude  tests  are  effective  in  predicting 
mechanical  learning  ability  and  knowledge  for  the  white  male  majority;  but  most 
women,  and  many  minorities,  perform  at  the  chance  score  level  on  these  tests 
This  implies  either,  that  most  women  and  many  minorities  have  no  mechanical 
aptitude  of  use  to  the  Coast  Guard,  or  that  we  have  not  yet  built  tests  that  are 
culture-fair  in  evaluating  their  ability  to  learn  to  do  mechanical  work.  Both 
my  staff  and  I  believe  that  the  tests  are  culturally  biased.    Even  so,  how  can 
I  implement  a  policy  which  permits  women  to  go  into  enlisted  ratings  which  are 
heavily  loaded  with  mechanical  skill  requirements,  if  all  we  know  about  women's 
-lechanical  aptitudes  is  that  they  score  at  the  chance  level  on  our  mechanical 
-Km?!!'    {  .""^  obvious  to  me  that  we  need  new  test  instruments  (which  we  are  now 
building)  to  tell  us  about  the  mechanical  abilities  of  women  and  minorities. 

xi'ii'  JQ 


Of  course  we  also  need  to  know  if  the  tests  are  valid  predictors  of  performance 
both  in  school  and  on  the  job. 

©IE  of  the  major  problems  has  been  a  lack  of  knowledge  about  the  job. 
Howevew,  I  expect  that  your  efforts  in  the  job- task  analysis  will  provide  basic 
informtion  which  can  be  used  to  evaluate  and  validate  not  only  the  test  instru- 
ments, but  also  the  curriculums  of  our  training  schools,  and  even  the  structure 
and  composition  of  the  job  itself.    This  effort  is  extremely  important.    It  has 
been  estimated  that  a  work  appraisal  system  for  Civil  Service  could  cost  the 
entire  Federal  Government  a  half  a  billion  dollars  a  year.    However  if  such  a 
system  could  increase  productivity  by  as  much  as  two  percent  it  would  effect 
savings  far  outweighing  its  cost. 

These  considerations  of  course  involve  technical  questions  which  you  as 
professionals  in  this  field  must  answer  with  empirical  studies.    These  studies, 
I  am  told,  must  conform  to,  the  new  Uniform  Guidelines  on  tmployee  Selection 
Procedures  just  released.    I  understand  that  the  intent  of  the  Uniform  Guide- 
lines is  to  assure  equity  and  justice  and  to  mandate  fair  recognition  of  the 
individual's  potential,  regardless  of  group  membership.    So,  it  seems  that  the 
social,  philosophical,  political,  and  legal  forces  have  resulted  in  producing 
these  guidelines;  just  as  they  did  in  our  Coast  Guard  decision  to  provide  equal 
opportunities  to  women. 

However,  these  great  and  elegant  decisions  cannot  be  fully  implemented 
without  the  supporting  technologies  to  help  the  organization  adapt  to  these 
changes. 

We  ask  your  help  to  assist  us  with  your  technology  as  we  accommodate  to  tfl 
change  that  equity  dictates.    Change  by  itself,  threatens  organizational  effi-^ 
ciency.    This  change  (especially  in  personnel)  is  greater  today  than  ever 
before.    Change  presents  a  problem.    This  has  always  been  true.    For  example 
a  young  Naval  officer  wrote  in  his  journal, 

Change  thus  succeeding  change  with  bewildering  rapidity. .. find  all 
who  have  sought  to  keep  up. ..have  bssn  called  upon  to  absorb  new 
ideas  before  the  last  has  been  assimilated. 

This  was  written  in  1879~almost  a  hundred  years  ago. 

If  change  is  handled  properly,  it  can  improve  the  quality  of  service  life, 
maintain  or  improve  productivity,  and  increase  the  level  of  professionalism. 

I  belie\e  this  is  your  mis:sion.    You  are  responsible  for  the  research  and 
(ifcvelopment  efforts  needed  to  supply  us  with  new  tools,  instruments,  and  pro- 
cadures  and  knowledges  which  he;lp  us  as  managers  to  effectively  accommodate  to 
new  situations-    I  m  also  confident  that  you  will  anticipate  future  changes, 
and  even  become  instruments  of  change  yourselves. 

I  am  sure  you  will  rise  to  this  occasion  because  it  is,  after  all,  your 
life  work.    To  thersxtent  that  you  are  always  concerned  for  the  individual,  and 
assume  a  responsibility  to  improve  the  quality  of  life  of  each  man  and  woman  in 
the  service,  both  the  individual  and  the  organization  will  benefit. 

This  is  my  belief,  but  only  you  can  make  it  happen. 


Thank  you. 
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20th  Annual  Conference  of  the 
MILITARY  TESTING  ASSOCIATION 


Officers,  Chairmen,  and  Committee 

MTA  President 
CAPTAIN  JAMES  E.  FERGUSON 

MTA  Secretary 
MR.  JOHN  A.  BURT 


Chairmen:  Committee: 

RICHARD  C.  WILLING  Program 

CW04  JOHN  E.  SCHWARTZ  Financial 

CW03  LARRY  N.  MONROE  Audiovisual 

LCDR  CLINTON  W.  CARTER  Social 

LT  LARRY  C.  YOUNG  Registration 


1  IS 


20th  Annual  Coi/ference 
MILITARY  TESTING  ASSOCIATION 


30  October  -  3  November 


MONDAY,  OCTOBER  30 


Lobby 

1200-1900 

Presidential  Suite 
(Room  404) 
1600-1800 


GdzcbO  ROOiii 

1 900-2000 


Registration 


Steering  Committee  Meeting 


Informal  Reception 


South  Ballroom 
0900-1000 

0900-0915 


0915-1000 

1000-1030 
1030-1140 


TUESDAY  MORNING,  OCTOBER  31 

Conference  called  to  order 

Greetings  by  Coast  Guard  Instituis 
Commanding  Officer 
CAPT  JAMES  E.  FERGUSON 

Keynote  Address 
RADM  W.  H.  STEWART,  USCG 
Chief,  Office  of  Personnel 

Break 

INTERNATIONAL  PRESENTATIONS 


"Strain  by  Prolonged  fluty  Hours  and  Problems  as  to 
Mobility  of  Soldiers  -  As  Seen  by  Federal  Armed  Forces 
Association"  (20  min. ) 

COL  H.E.  SEUBERLICH,  German  Federal  Armed  Forces 
Association 

"Execution  of  Large  Occupational  AnaTrysis,  of  the 
Royal  Navy's  Operations  Branch"  (20iwin.) 
CD.  BEEL,  Royal  Navy 

"A  Strategy  for  Task  Analysis  and  Crrterion  Definition 
Based  on  Nonmetric  Multidimensional  Scalinq"  (30  min.) 
LCOL  GLENN  M.  RAMPTON,  Canadian  FtDnaes  Personnel, 
Applied  Research  Unit 
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1140-1200 
1200-1300 


Announcements 
Lunch 

TUESDAY  AFTERNC  'T,  O'CTOBER  31 


Appal oosa  Room 
1300-1430 


PERSONNEL  APPRA 


Appal oosa  Room 
1500-1630 


Arabian  Room 
1300-1430 


"Quality  of  ROT"  essioits  to  the  Army  Officer  Corps" 
(15  min. ) 

DR.  ARTHUR  C.F.  GILBERT  and  DR.  RICHARD  S.  WELLINS, 
Army  Research  Institute,  and  DR.  JOHN  I.  WELDON,  U.S. 
Army  Training  and  Doctrine  Command 

Prediction  of  Reading  Grade  Levels  of  Service 
Applicants  from  Armed  Services  Vocational  Aptitude 
Battery  (ASVAB)"  (30  nin.) 
JOHN  J.  MATHEWS  AND  LONNIE  D.  VALENTINE,  JR., 
Brooks  Air  Force  Base,  WAYNE  S.  SELLMAN,  Randolph 
Air  Force  Base 


EXAMINATION  ITEMS 
Evaluating  and  Improving 

"Objective  Evaluation  of  Correspondence  Course  Items' 
(30  min. ) 

DR.  ANDREW  N.  DOW,  USNETPDC 

"The  Emergence  of  an  Item-Writing  Technology" 
(30  min. ) 

GALE  ROID  and  TOM  HALADYNA,  Oregon  State  System  of 
Higher  Education 


METHODS  OF  DETERMINING  PERSONNEL  AVAILABILITY 

"PAM:    A  Methodology  for  Predicting  Air  Force  Personnel 
Availability"  (20  min.) 

H.  ANTHONY  BARAN,  ANDREW  J.  CZUCHRY,  JOHN  C.  GOCLOWSKI, 
DUNCAN  L.  DIETERLY,  FREDRIC  F.  PHILLIPS,  STUART  E.  PESKOE, 
and  ANTHONY  J.  LOFASO,  Air  Force  Human  Resources  laboratory 

Symposium:    Methodology  for  Mobilization 
Population  Inventory 
Chairman:    DR.  JACK  M.  HICKS 

"Some  Implications  of  Commercial  Jezt  Normings 

for  Mobilization  Surveys" 

R.  F.  BOLDT,  Educational  Testing  Service 

"Measuring  the  Military  Base  Population  of  the  1980's" 
M.  A.  FISpHL,  US  Army  Research  Institute 
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"Development  of  a  Mobilization  Population  Inventory 
Using  Existing  ASVAB  Data  Banks" 
GEORGE  V.  RUX  and  WILLIAM  GRAHAM,  Military 
Enlistment  Processing  Command 

"Air  Force  Experience  with  PROJECT  TALENT" 
LONNIE  D.  VALENTINE,  JR.,  Air  Force  Human 
Resources  Laboratory 


PERFORMANCE  MEASUREMENT 

"Complexity  of  Flight  Path  Data  as  an  Index  of 
Skill  in  Piloting  Performances  from  a  Flight 
Simulator  Based  Job-Sample  Test"  (25  min.) 
BRIAN  D.  SHIPLEY,  JR.,  US  Army. Research  Institute 
Field  Unit,  Fort  Rucker,  Alabama 

"Evaluation  of  Intelligence  Producing  Capability  of 
Selected  Combat  Arms  Units"  (40  min.) 
EARL  W.  RUBRIGHT,  80th  MTC/NSA 
ALVALINE  JACKSON 

"Learning  Aptitude,  Error  Tolerance,  and 
Achievement  Level  as  Factors  of  Performance 
in  a  Visual -Tracking  Task"  (25  min.) 
BRIAN  D.  SHIPLEY,  JR.,  US  Army  Research  Institute 
Field  Unit,  Fort  Rucker,  Alabama 


VALIDATION-PREDICTION,  Session  1 

"The  Impact  of  Valid  Selfiction  Procedures  on 

Workforce  Productivity"  (25  min.) 

FRANK  L.  SCHMIDT,  ROBERT  C.  McKENZIE,  and 

TRESSIE  W.  MULDROW,  U.S.  Civil  Service 

Commission  and  JOHN  E.  HUNTER s  Michigan  State 

University 

"Job  Performance  of  USAF  Bypassed  Specialists"  (20  min.) 
CAPT  WILLIAM  H.  CUMMINGS  and  CAPT  DAVID  S.  VAUGHAN,  USAF 
Occupational  Measurement  Center 

"Analysis  of  Heavy  Equipment  Operator  Jobs"  (25  min.) 
SIDNEY  A.  FINE,  HOWARD  C^  OLSON,  DAVID  D. 
MYERS,  and  MARGAREHE  C.  JENNINGS,  Advanced 
Research  Resources  Organization 


VALIDATION-PREDICTION,  Sess.ion  2 


"Predictive  Utility  of  the  Officer  Evaluation  Battery 
(OEB)"  (15  min.) 

DR.  ARTHUR  C.F.  GILBERT,  US  Army  Research  Institute 


"Assessment  Center  Variables  as  Predictors  of  On-Job 
Performance  Characteristics"  (25  min.) 
DR.  CHARLES  H.  CORY,  NPRDC 

"Using  an  Assessment  Center  to  Predict  Leadership 
Course  Performance  of  Army  Officers  and  NCOs"  (25  min.) 
FREDERICK  N.  DYER  and  RICHARD  E.  HILLIGOSS, 
Army  Research  Institute  Field  Unit,  Fort 
Benning,  Georgia 

"Validity  of  Associate  Ratings  of  Performance  Potential 
by  Army  Aviators"  (15  min.) 

ROBERT  F.  EASTMAN,  US  Army  Research  Institute  Field  Unit, 
Fort  Rucker,  Alabama,  and  MARIE  LFGFR,  US  Army  Research 
Institute 

WEDNESDAY  MORNING,  NOVEMBER  1 


Appal  00 sa  Room 

0800-S935  OCCUPATIONAL-TASK  ANALYSIS,  Session  1 

Issues  and  Answers 


Obstacles  to  and  Incentives  for  Standardization  of  Task 
Analysis  Procedures"  (20  min.) 

ROBERT  W.  STEPHENSON  and  HENDRICK  W.  RUCK,  Air  Force 
Human  Resources  Laboratory 

"Task  Analysis:  Destination  or  Journey"  (15  min.) 
DR.  MELVIN  D.  MONTEMERLO  and  DR.  FRANK  M.  AVERSANO 
US  Army  Training  Support  Center 

"Four  Fundamental  Criteria  for  Describing  the  Tasks  of 
an  Occupational  Specialty"    (20  min.) 
DR.  WALTER  E.  DRISKILL  and  CAPT  FRANK  C.  GENTNER,  USAF 
Occupational  Meesurement  Center 

"Two  Applications  of  Occupational  Survey  Data  in  Making 
Training  Decisions"    (20  min.) 

CAPT  DAVID  S.  VAUGHAN,  ATC  Technology  Applications  Center 
CAPT  JOHN  R.  WELSH 

"The  Stability  Over  Time  of  Air  Force  Enlisted 
Career  Ladders  as  Observed  in  Occupational 
Survey  Reports"  (20  min. ) 

WALTER  E.  DRISKILL  and  FREDERICK  B.  BOWER,  JR., 
USAF  Occupational  Measurement  Center 
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OCCUPATIONAL-TASK  ANALYSIS,  Session  2 
Using  Instructional  Systems  Development 


"The  Collection  and  Prediction  of  Training  Emphasis 
Ratings  for  Curriculum  Development"  (20  min.) 
HENDRICK  W.  RUCK,  NANCY  A.  THOMPSON,  AND  SQDN  LDR 
DAVID  C.  THOMSON,  USAF  Human  Resources  Laboratory 

"Data  Base  to  Determination  of  Training  Content:  A 
Manageable  Solution"  (20  min.) 
D.D.  DAVIS,  CNET 

"Using  the  Computer  to  Build  the  Task  Inventory"  (15  min.) 
THOMAS  M.  ANSBRO,  CNET 

"Systematic  instructional  Validation  Through  Testing" 
(;5  min. ) 

DR.  MARJORIE  A.  KUENZ  and  FREDERICK  C.  ROBERTS,  III 
Naval  Health  Sciences  Education  and  Training  Command 


STATISTICAL  AND  MEASUREMENT  METHODOLOGIES,  Session  1 


"A  Primer  of  Item  Response  Theory"  (30  min.) 
THOMAS  A.  WARM,  US  Coast  Guard  Institute 

"A  New  Procedure  to  Make  Maximum  Use  of  Available 
Information  When  Correcting  Correlations  for  Restriction 
in  Range  Due  to  Selection"  (30  min.) 
DR.  JAMES  0.  BOONE,  Civil  Aeromedical  Institute,  Federal 
Aviation  Agency 


STATISTICAL  AND  MEASUREMENT  METHODOLOGIES,  Session  2 


"A  Comparison  of  Three  Models  for  Determining  Test 
Fairness"  (25  min. ) 

DR.  MARY  A.  LEWIS,  Civil  Aeromedical  Institute,  Federal 
Aviation  Agency 

"A  Method  to  Evaluate  Performance  Reliability  of 
Individual  Subjects"  (15  min.) 

ALAN  E.  JENNINGS,  Civil  Aeromedical  Institute,  Federal 
Aviation  Agency 


2. 


"A  Comparison  of  Two  Criterion-Referenced  Scoring  Procedur-is 
for  an  Answer-Until -Correct,  Multiple-Choice  Performance 
Test"  (20  min. ) 

DR.  JOHN  B.  MEREDITH,  JR.  and  J.  THOMAS  MARTIN,  JR.,  Data- 
Design  Laboratories 

"An  Analysis  of  the  OE  Concept  and  Suggested  Improvements" 
(30  min. ) 

DR.  CLAY  E.  GEORGE  and  HENRY  L.  KINNISON,  Texas  Tech 
University  and  H.  WAYNE  SMITH 

Palomino  Room 

0800-0930  VALIDATION- PREDICTION,  Session  3 


"Performance  Test  Objectivity:    Comparison  of  Inter rater 
Reliabilities  of  Three  Observation  Formats"  (30  min.) 
GERALD  J.  LAABS,  Navy  Personnel  R&D  Center 
WILLIAM  A.  NUGENT 


"Prediction  of  Field  Artillery  Officer  Performance" 
(15  min. ) 

ARTHUR  C.F.  GILBERT,  RAYMOND  0.  WALDKOETTER,  and 
ANTHONY  E.  CASTELNOVO,  US  Army  Research  Institute 

Palomino  Room 

1000-1130  VALIDATION-PREDICTION,  Session  4 

Symposium:    Innovative  Test  Validation  Strategies 
Chairman:    MARVIN  H.  TRATTNER 

"Construct  Validity" 

BRIAN  S.  O'LEARY,  U.S.  Civil  Service  Commission 

"Test  of  a  New  Model  of  Validity  Generalization: 
Results  for  Tests  Used  in  Clerical  Selection" 
KENNETH  PEARLMAN  and  FRANK  L.  SCHMIDT,  U.S.  Civil 
Service  Commission,  JOHN  E.  HUNTER,  Michigan 
State  University 

"Synthetic  Validity" 

MARVIN  H.  TRATTNER,  U.S.  Civil  Service  Conmission 


WEDNESDAY  AFTERNOON,  NOVEMBER  1 


Appal  00 sa  Room 

1300-1430  OCCUPATIONAL  TASK  ANALYSIS,  Session  3 

Instructional  Systems  Development  (ISD)  and  NEPDIS 
Overview 
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"Scheduling  Formal  School  Training  to  Maximize 
Cost  Effecti  veness"  (20  min. ) 
DOUG  GOODGAME,  Texas  A&M  University 

"Methods  for  Determining  Safety  Training  Priorities  for  Job 
Tasks  (20  min.) 

NANCY  A.  THOMPSON  and  HENDRICK  W.  RUCK,  Air  Force  Human 
Resources  Laboratory 

Appal oosa  Room 

1500-1630  OCCUPATIONAL-TASK  ANALYSIS,  Session  4 

Applying  Task  Analysis  Methodology 

"Methods  for  Collecting  and  Analyzing  Task  Analysis 
Data"  (20  min.) 

A.  JOHN  ESCHENBRENNER  and  PHILIP  B.  DeVRIES,  McDonnell 
Douglas  Astronautics  Co.,  HENDRICK  W.  RUCK,  Air  Force 
Human  Resources  Laboratory 

"Methodology  for  Selection  and  Training  of 
Artillery  Forward  Observers  Job  Analysis"  (20  min.) 
JOHN  B.  MOCHARNUK  and  RUTH  ANN  MARCO, 
McDonnell  Douglas  Astronautics  Co. 

"Observer  Self-Location  Ability  and  Its  Relationship 
to  Cognitive  Orientation  Skills"  (30  min.) 
JOHN  R.  MILLIGAN  and  RAYMOND  0.  WALDKOETTER, 
Army  Research  Institute  Field  Unit,  Fort  Sill, 
Oklahoma 


"Job  Analysis  in  the  US  Army  Medical  iraining  Environment" 
(20  min. ) 

J.  S,  TARTELL,  US  Army 

Arabian  Room 

1300-1430  SIMULATORS  AND  SIMULATION,  Session  1 

Design,  Evaluation,  and  Personnel  Performance 

"Evaluation  of  Troubleshooting  Simulator" 
(30  min.) 

DALE  A.  STEFFEN  and  ANITA  S.  WEST,  Denver 
Research  Institute 
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"Methodology  for  Evaluating  Operator  Performance  on 
Tactical  Operational  Simulator/Trainers"  (30  min.) 
DR.  CHARLES  W.  HOWARD,  Arrny  Research  Institute,  Fort 
Bliss,  Texas 

"Critical  Performances  of  Battalion  Command  Groups" 
(30  min.) 

IRA  T.  KAPLAN  and  HERBERT  F.  BARBER,  Army  Research 
Institute,  Fort  Leavenworth,  Kansas 

Arabian  Room 

1500-1630  SIMULATORS  AND  SIMULATION,  Session  2 

Design,  Evaluation  and  Personnel  Performance 
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ROBERT  G.  HENDERSON,  Defense  Language  Institute,  Foreign 
Language  Center 

"Monte  Carlo  Computer  Programs  for  Simulating  Selection 

Decisions  from  Personnel  Tests"  (30  min.) 
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MICHAEL  C.  THEW  and  JOHNNY  J.  WEISSMULLER,  Air  Force 
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(20  min.) 
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CIVILIAN  GROUND  SAFETY  OFFICER  JOB  AND  TRAINING 
REQUIREMENTS  SURVEY 

By 

Douglas  K.  Cowan 
Air  Force  Human  Resources  Laboratory 
Brooks  AFB,  Texas 

The  opinions  and  conclusions  expressed  in  this  paper 
are  those  of  the  author  and  are  not  necessarily 
those  of  the  United  States  Air  Force. 


I.  INTRODUCTION 

This  study  was  the  result  of  an  expressed  need  by  the  Air  Force 
Inspection  and  Safety  Center  (AFISC)  to  detei-ilne  job  types  existing 
within  the  civilian  ground  safety  officer  area  and  to  identify  training 
requirements  essential  to  the  career  development  of  the  job  incumbents. 
Consequently,  the  objectives  of  this  study  were  to  Identify  significant 
job  types  within  the  civilian  ground  safety  officer  population  and  the 
job  characteristics  which  differentiate  the  identified  job  types  from 
one  another;  to  compare  the  task  training  emphasis  recommended  by  job 
Incumbents  to  the  training  emphasis  placed  on  tasks  within  the  Ground 
Safety  Officer  course  (CIP05D);  and  to  construct  a  recommended  career 
progression  ladder  for  civilian  ground  safety  officers  comparable  to 
that  which  exists  for  Air  Force  enlisted  members,  inasmuch  as  no 
career  progression  ladder  currently  exists  for  civilian  ground  safety 
officers . 

II •  METHOD 

The  job  inventory  used  to  collect  job  information  from  the 
civilian  ground  safety  officers  was  developed  by  the  Air  Force  Inspection 
and  Safety  Center  (AFISC),  with  the  assistance  of  the  USAF  Occupational 
Measurement  Center  (OMC)  and  the  Air  Force  Human  Resources  Laboratory 
(AFHRL).    The  inventory  was  based  upon  Air  Force  job  survey  procedures 
spelled  out  in  AFR  35-2,  Occupational  Analysis.    It  consisted  of  a 
background  information  section,  which  included  personal  and  job-related 
data  items,  and  a  list  of  295  significant  work  tasks  organized  under 
eleven  major  duty  headings.    In  the  background  information  section, 
each  incumbent  was  questioned  concerning  formal  education,  pay  grade, 
training  courses, completed,  and  other  job-related  items.    The  listing 
3f  tasks  was  reviewed  by  the  incumbent  for  tasks  performed  in  his 
current  job.    Each  task  performed  was  rated  using  a  relative  9-point 
time  spent  scale  to  obtain  an  index  that  could  be  used  to  estimate 
low  his  time  was  distributed  across  all  tasks  in  his  job . 
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The  job  inventory  was  administered  during  April  and  May  1977 
by  AFISC  to  Department  of  the  Air  Force  civilian  employees  who  were 
assigned  duty  as  ground  safety  officers  and  who  had  volunteered  to 
complete  the  survey.    A  total  of  212  job  inventories  was  received  from 
the  field  for  analysis,  which  represented  about  fifty  percent  of  the 
population. 

An  identical  duty  and  task  listing,  but  with  a  9-point  scale  to 
reflect  training  emphasis  recommended  for  each  task,  was  sent  to 
approximately  50  civilian  ground  safety  officers  at  duty  locations 
across  the  continental  United  States  to  obtain  an  estimate  of  needed 
training  emphasis  on  each  task.    Forty-six  civilian  ground  safety 
officers  voluntarily  completed  the  ratings  and  returned  the  survey 
booklets  for  analysis. 

A  similar  9-point  rating  scale  using  the  same  tasks  and  duties 
was  forwarded  to  the  School  of  Engineering,  Arizona  State  University, 
Tempe,  Arizona,  to  obtain  training  emphasis  ratings  from  instructors 
in  the  Ground  Safety  Officer  Course  (CIP05D)^  to  gain  an  estimate  of 
current  emphasis  placed  on  training  for  the  tasks  listed  in  the  job 
inventory . 


III.  RESULTS 

Job  Survey  Analyses 

Job  analyses  were  performed  using  several  of  the  Comprehensive 
Occupational  Data  Analysis  Programs  (CODAP)  described  by  Archer  (1977), 
Christal  and  Ward  (1967),  Morsh  and  Christal  (1966),  and  Christal  (1974). 
Six  specific  job  types  were  identified  through  the  hierarchical  grouping 
process.    Figure  1  shows  the  six  job  types  and  the  grouping  diagram. 
Nominal  titles  were  assigned  to  each  job  type,  based  upon  a  functional 
analysis  of  the  incumbents'  job  titles  and  assignment  information. 

Although  six  job  types  were  identified  through  the  grouping  process, 
two  of  the  groups,  GRP  006  and  GRP  028,  appeared  to  be  major  command 
specific  and,  therefore,  outside  of  a  normal  career  progression  route. 
Figure  2  depicts  a  conceptualized  career  ladder  based  upon  an  analysis 
of  the  hierarchical  clustering  of  the  sample  and  the  level  of  the  job 
as  determined  by  background  information  supplied  by  incumbents.  The 
civilian  career  ladder  depicted  is  strikingly  similar  to  the  airman 
career  ladder  presented  in  AFR  39-1,  Airman  Classification  Regulation, 
for  the  safety  specialty,  AFSC  241X0. 


^  Course  offered  by  Arizona  State  University  under  Government  contract. 
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Total  Sample 


Figure  1.    Cluster  Diagram  of  Ground  Safety  Officer  Job  Typ, 
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The  average  estimated  percent  time  spent  by  the  members  of  the 
six  job  types  was  summed  by  duty.    The  result*  are  displayed  In  Table 
1.    The  most  time-consuming  duty  for  each  job  types  has  been  circled  to 
Illustrate  the  primary  function  of  the  group.    The  distinction  between 
the  ground  safety  specialist,  the  traffic  safety  specialist,  and  the 
AFLC  safety  specialist  was  rather  clear-cut.    However,  the  differences 
for  the  managerial  job  types  were  not  as  clearly  evident.    Both  the  safety 
managers  and  chiefs  of  safety  spread  their  time  across  all  duties,  but 
,^he  m^ers  of  the  chief  of  safety  group  spent  more  than  57%  of  their 
Jflme  ^  simeryfso^N^asks  (duties  A,  B,  C,  &  J),  while  the  safety  managers 

tpentqp^jgifiX  cJf  i^heir  time  In  the  same  duties.    While  the  Major  Command 
afetj^  tekjj^ctoif  GgneTjil  g^'^-p  nniT  rha  rii-fof  nf  cj^f e|>y  g^^iip  are  the  most 
^lffl^»:  ^oydl^l^gu^sh.  It  should  be  noted  that  the  tlajor  Command 
Safety^  Inspec^at_g^n^"l  group  spends  more  time  In  sup<;rvlsory  duties 
3,  B,  and  C  than  the  Chief  of  Safety  group  (52.18%  vs. 
^Ime  In  forms,  records,  and  reports,  duty  E  (9.14%  vs. 


^ess  time  In 
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44.59%),  and  more 
5.06%);  but  spends 
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tlon,  duty  J  (7.03%  vs.  12.82%)  aid  no  time  In 
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?abl°  1.  EstiFiatfid  Percent  Tipie  Spent  bv      for  Mftinbers  of  M  Job  ^'pe 

  Percent  fim  SpenT 

^•round       Traffic     Safety  W  of 
2H!Z  -  .^^^^e  Saffitv^iec.  Safety ''pec.  Kana?er  Safetv 


A  Organizing  and  planning  ?.c6  l.^^]  ]],%  ^ 

B  Birftcting  and  Inplementlns  ],%  in,?9 

C  Inspectlnp  and  evalratln|»  5.1^  .^.W  ^.p-^  ii^^i^ 

n  Training  1,53  f.cl,  ;.,p^ 

B  Preparing  naintainin?  forms,  7,33  [!,77  A,^;^  ^,06 
records,  and  reports 

?  Performr  accident  invfistiFations  I?,??  1,90  lo.'ir.  7,P9 

G  Performing  site  or  facility  safety  18.1^  ?,f)3  ii;,29 

inspections                  '  /\ 

H  Conducting  traffic  safety  trainin?  C,35  {iO,37)  2,7^  1,0^^ 
and  education 

I  Preparing  ground  accident  indices  pi.lM  3,)i7  i^jij  ij^, 

J  Coordinating;  and  maintainine^  liaison  "^.33  ?,36  3,flo  i?,fl? 

K  Performing  f!eneral  unit  safety  ?.07  ?,c^5  i,^c;]  |j^fi< 
functions 

 Totals*   90,95  oo.QC  09^01  QQ^Q[^ 

*Totals  do  not  sum  to  lOOl  due  to  roundinc  error 


of 


Table  1.  Average  Hunber  of  Tasks  Performed  and  Number  of  Tasks  Performed  by  Selected  Percentages  of  Time 
Group 


Average  Number  of 
Tasks  Performed 


Number  of  Tasks  Accounting  for  Selected  Percentages 
of  Cumulative  Time  Spent  on  the  Group  Job  Description 


Safety  Manager 

188 

Hi 

3Ua 

92 

155 

100a 
294 

Ground  Safety  Specialist 

118 

97 

105 

222 

Chief  of  Safety 

108 

24 

55 

101 

215 

AFLC  Safety  Specialist 

106 

22 

31 

Q7 
S/ 

274 

Traffic  Safety  Specialist 

79 

13 

30 

69 

257 

Major  Command  Safety  IG 

49 

11 

26 

60 

139 

Table  3,    Selected  Background  Variables  by  Job  Type 


Group 


Sex 

Total  Count  Average 


Average  Years  Average 


Average  Months 


Ground  Safety  Specialist 

3 

3 

8.20 

15.33 

noncns  m  jod 
19 

on  Base 
97 

Traffic  Safety  Specialist 

26 

2 

7.84 

13.82 

72 

100 

Safety  Manager 

108 

0 

10.98 

14.42 

54 

81 

Chief  of- Safety 

8 

0 

12.14 

16.38 

54 

154 

Major  Command  Safety  IG 

6 

0 

12.00 

14.83 

34 

100 

AFLC  Safety  Specialist 

33 

0 

9.91 

14.56 

49 

113 
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The  average  grade  by  job  type  ranged  from  about  GS-8  to  GS-12,  with 
the  average  grade  of  the  total  sample  being  slightly  higher  than  GS-10. 
Members  of  all  job  types  indicated  rather  high  levels  of  education,  with 
the  chief  of  safety  group  members  showing,  overall,  at  least  the  attain- 
ment of  a  baccalaureate  degree  (or  equivalent  years  of  education)  plus 
some  additional  education  completed.    The  lowest  average  number  of  months 
in  the  job  (19)  was  reported  by  the  ground  safety  specialist,  with  the 
greatest  number  of  months  in  the  job  (72)  being  reported  by  the  traffic 
safety  specialist.    All  groups  reported  fairly  long  base  tenure. 


Training  Emphasis  Analyses 

Training  emphasis  analyses  were  completed  using  selected  CODAP 
programs.    Mean  ratings  of  tasks  provided  by  civilian  job*  incumbents 
were  computed.    A  Spearman  rankorder  correlation  was  computed  between 
the  mean  recommended  training  emphasis  ratings  provided  by  the  job 
incumbents  and  the  percent  of  members  performing  the  same  tasks, 
resulting  in  an  r^  »  .80.    A  like  correlation  coefficient  was  computed 
for  estimated  percent  time  spent  on  the  tasks  with  recotnmended  training 
emphasis  ratings,  which  produced  an  rg  «  .79.    Both  correlation  coef- 
ficents  are  significant  at  less  than  the  .001  level  of  confidence, 
indicating  that  recommended  training  emphasis  is  very  highly  related 
to  task  performance.    However,  a  substantial  amount  of  variance 
(approximately  36%)  in  training  emphasis  is  not  accounted  for  by  task 
performance  alone  and,  as  discussed  by  Ruck,  Thompson,  &  Thomson  (1978) 
in  their  paper,  "The  Collection  and  Prediction  of  Training  Emphasis 
Ratings  for  Curriculum  Development,"  other  factors  such  as  consequences 
of  inadequate  performance,  task  delay  tolerance,  task  difficulty,  etc. 
must  be  considered.    A  Spearman  rather  than  a  Pearson  correlation  was 
computed,  because  neither  percent  of  members  performing  nor  percent 
time  spent  are  normally  distributed  variables. 

Table  4  shows  the  number  of  tasks  in  the  total  sample  job  description 
that  received  a  mean  training  emphasis  rating  (2.53)  or  higher,  the 
estimated  percentage  of  time  accounted  for  by  these  tasks,  and  the 
number  of  these  tasks  that  were  identified  as  being  part  of  the  Ground 
Safety  Officer  school  curricula.    Also  shown  is  the  total  number  of 
tasks  identified  in  the  job  inventory  as  being  taught  in  the  school 
and  the  Spearman  rankorder  correlations  of  job  incumbent  training 
emphasis  ratings  with  percent  of  members  performing  the  tasks  and 
estimated  percent  time  spent  on  the  tasks. 

From  Table  4  it  can  be  seen  that  the  Ground  Safety  Officer  school 
provides  training  in  less  than  half  of  the  tasks  with  high  recommended 
training  emphasis  (66  out  of  140  tasks),  but  also  provides  training 
on  41  additional  tasks  which  did  not  receive  high  recommended  training 
emphasis . 
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fable  A.    Relationship  of  Training  Emphasis  Ratings  arid  Task  Petformance  By  Total  Sample  

.  ■  Number,  Percentage 
rarlable  Description  or  Correlation 


Nuflkber  of  tasks  with  mean  or  higher  recommended  training  emphasis  ratings  140 

.  B        Number  of  tasks  with  below  the  mean  recommended  training  emphasis  ratings  155 

C        Percentage  of  job  Incumbent  time  accounted  for  by  tasks  In  variable  A  65% 

D        Number  of  tasks  In  variable  A  Included  In  Ground  Safety  Officer  school  66 

E        Total  number  of  tasks  Identified  as  Included  In  Ground  Safety  Officer  school  107 

F        Spearman  rankorder  corelatlon  between  recommended  training  emphasis  ratings  ,80 
and  percent  members  performing 

G        Spearman  rankorder  correlation  between  recommf.tided  training  emphasis  ratings  ,79 
and  estimated  percent  time  spent 
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A  Chi  Square  test  was  performed  on  tasks  above  and  below  the  mean 
on  recommended  training  emphasis  versus  whether  the  tasks  were  or  were 
not  being  taught  In  the  school.    The  computed  Chi  Square  value  was 
12.74  (df  «  1),  which  Is  significant  beyond  the  .001  level  of  confidence. 
This  finding  Indicates  that  the  school  put  relatively  more  weight  on 
teaching  the  tasks  with  higher,  rather  than  lower,  recommended  training 
emphasis. 

The  percent  time  spent  values  for  the  taught  and  untaught  tasks 
were  summed  separately  for  each  duty  (see  Table  5).    Inspection  of 
the  time  spent  values  for  taught  and  untaught  tasks  In  the  traditional 
management  course-related  duties  (A,  B,  &  D)  and  the  nonmanagement 
duties  (F,  I,  &  K)  revealed  a  much  heavier  emphasis  by  the  school  on 
the  management  areas  than  on  the  nonmanagement  areas.    The  remaining 
unlisted  duties  contain  a  mixture  of  management,  administrative,  and 
worker-level  tasks.    The  school  emphasis  on  management  is  one  reason 
why  many  of  the  tasks  with  higher  recommended  training  emphasis  were 
not  being  taught.    Another  reason  Is  that  some  of  these  tasks  are  better 
taught  by  OJT. 


Table  5.  Estimated  Time  Spent  on  Taught  and  Untaught  Tasks  by  Duty  for 
Management  Course-Related  and  Nonmanagement  Duties  

Percent  Time  Spent 

Management  Course-Related  Duties  Taught  Untaught 

A.  Organizing  and  Planning  6.75  4.06 

B.  Directing  and  Implementing  6.47  3,84 

C.  Training  2.93  1.25 

TOTAL  16.15  9.15 

Nonmanagement  Duties 

F.    Performing  Accident  Investigations  2.97  7.19 

I.    Preparing  Ground  Accident  Indices  4.07  7.40 

K.    Performing  General  Unit  Safety  Functions  .44  3.67 


TOTAL  7.48  18.26 
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IV.    CONCLUSIONS  AND  RECOMMENDATIONS 


The  use  of  the  procedures  established  by  AFR  35-2  In  collecting 
Job  Informatloa  from  small  populations  appears  to  have  produced  high 
quality  Information  similar  to  that  attained  from  large  military  popu- 
lations.   Dlffere^c  job  types  were  clearly  Identified  through  the  use 
of  CODAP,  which  allowed  the  conceptualization  of  a  clear  progression 
path  for  civilian  ground  safety  officers.    Since  mean  job  incumbent 
recommended  training  emphasis  ratings  were  very  highly  correlated  to 
percent  members  performing  and  estimated  percent  time  spent  data.  It 
must  be  assumed  that  these  factors  can  be  used  Interchangeably  to  account 
for  most  of  the  Information  contained  In  the  training  emphasis  variable. 
It  appears  that  a  viable  method  for  determining  which  tasks  should  receive 
training  can  be  developed  using  the  percent  members  performing  and  percent 
time  spent  data  to  determine  at  what  career  progression  level  tasks  tend  to 
be  performed,  and  then  the  training  emphasis  data  can  be  used  to  determine 
which  tasks  need  special  training.    Summaries  of  background  Information 
provided  valuable  Insight  Into  the  grade  structure  of  the  work  force,  as 
well  as  Information  about  the  educational  level  of  the  job  Incumbents  and 
other  pertinent  information  not  readily  available  elsewhere. 

The  Ground  Safety  Officer  course  (CIP05D)  appears  to  be  fully 
supportive  of  the  accident  prevention  program  by  providing  management 
safety  education  to  ground  safety  job  incumbents,  since  47%  of  the  tasks 
that  job  Incumbents  rated  fairly  high  on  recommended  training  emphasis 
are  also  rated  as  being  included  in  the  Ground  Safety  Officer  school. 
The  remaining  tasks  receiving  fairly  high  estimates  of  training  emphasis 
appear  to  be  tasks  that  could  probably  be  trained  during  in-house 
training  sessions,  without  recourse  to  formal  school  training. 

From  the  conclusions,  it  appears  that  the  following  recommendations 
are  in  order: 

!•    Thatsome  form  of  career  progression  path  similar  to  the  one 
presented  in  this  paper  be  established  to  formalize  the  present  de-facto 
civilian  career  progression  ladder. 

2.  That  the  relative  priorities  of  technical  and  managerial  skills 
and  knowledges  be  determined  by  field  interviews  that  would  evaluate 
the  consequences  for  job  performance  and  career  progression. 

3.  That  consideration  be  given  to  assembling  a  panel  of  experts 
to  "scrub  down"  the  existing  Ground  Safety  Officer  course  by  system- 
atically reviewing  task  training  data  on  a  task  by  task  basis. 


45 

26 


REFERENCES 


AF  Regulation  35-2,  Occupational  Analysis,    Washington,  D.C.: 
Department  of  the  Air  Force,  6  December  1976. 

AF  Regulatioii  39-1,  Airman  Classification  Regulation,    Washington,  D.C.: 
Department  of  the  Air  Force,  1  June  1977. 

Archer,  W.  B.    Computation  of  group  job  descriptions  from  occupational 
survey  data.    PRL-TR-66-12,  AD-653  654.    Lackland  Air  Force  Base, 
TX:    Personnel  Research  Laboratory,  Aerospace  Medical  Division, 
December  1966. 

Chrlstal,  R.  E.    The  United  States  Air  Force  occupational  research 
project.    AFHRL-TR-73-75,  AD-774  574.    Lackland  AFB,  TX: 
Occupational  Research  Division,  Air  Force  Human  Resources 

T    .   ^  .    1  n*7  /. 

Chrlstal,  R.  E.,  &  Ward,  J.  H.,  Jr.    The  MAXOF  clustering  model.  In 
M.  Lorr  &  S.  B.  Lyerly  (^ds.).  Proceedings  of  the  conference  on 
cluster  analysis  of  multivariate  data.    New  Orleans,  LA:  Catholic 
University  of  America,  June  1967,  11.02-11.45. 

Morsh,  J.  E,,  &  Chrlstal,  R.  E.    Impact  of  the  computer  on  job  analysis 
in  the  United  States  Air  Force.    PRL-TR-66-19,  AD-656  304. 
Lackland  AFB,  TX:    Personnel  Research  Laboratory,  Aerospace  Medical 
Division,  October  1966. 

Ruck,  H.  W. ,  Thompson,       A.,  &  Thomson,  D.  C.    The  collection  and 

prediction  of  training  emphasis  ratings  for  curriculum  development. 
Oklahoma  City,  OK:    20th  Annual  Conference  of  the  Military  Testing 
Association,  30  October-3  November  1978. 


27 


16 


DETERMINING  THE  TRAINING  REQUIREMENTS  OF 
UNITED  STATES  COAST  GUARD  WARRANT  AND 
COMMISSIONED  OFFICER  BILLETS 


J.  W.  Cunningham  and  D.  W.  Drewea 
North  Carolina  State  University  at  Raleigh 


Paper  presented  at  the  annual  meeting  of  the 
Military  Testing  Association 
Oklahoma  City,  November  2,  1978 


■17 


28 


Frequently  changing  duty  assignments  and  staffing  patterns  in  the 
U.  S«  Coast  Guard  create  a  continuing  need  for  officers  to  acquire  new 
knowledges  and  skills,  which  in  many  cases  are  best  provided  through 
formal  training  and  education  programs •    Because  of  the  high  costs 
associated  with  such  programs,  however,  it  is  essential  that  a  system- 
atic, empirical  basis  be  established  which  will  allow  the  Coast  Guard  to 
identify  and  provide  within  available  funds  the  education  and  training 
most  relevant  to  service  requirements.    It  was  in  response  to  this  need 
that  the  U.  S.  Department  of  Transportation  contracted  North  Carolina 
State  University  to  develop  procedures  and  provide  a  data  base  that 
would  allow  the  Coast  Guard  to  assess  its  officer  knowledge  and  skill 
requirements  and  to  evaluate  its  postgraduate/post-commission  education 
and  training  program  against  those  requirements. 

In  designing  a  study  for  that  purpose,  we  recognized  that  the  mili- 
tary had  historically  used  job/task  analysis  to  establish  job  require- 
ine'-ts,  trtiich,  in  turn,  provided  a  basis  for  the  development  of  training 
curricula.    For  lower-skill  jobs  employing  large  numbers  of  people,  it 
is  feasible  to  conduct  such  short-term  training  within  military  facili- 
ties.   However,  the  small  numbers  of  people  involved  and  the  level, 
types,  and  diversity  of  professional  and  technical  knowledge  required 
make  it  infeasible  in  most  cases  for  the  Coast  Guard  to  conduct  the  ad- 
vanced training  needed  by  its  officers.    For  that  reason,  the  Coast  Guard 
has  generally  used  colleges,  universities,  and  other  institutions  to  up- 
grade knowledges  and  skills  in  its  officer  ranks.    Within  that  context, 
we  thought  it  reasonable  to  define  training  requirements  in  terms  of 
educational  courses  and  training  modules,  rather  than  attempting  to 
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derive  such  requirements  through  the  delineation  of  specific  job  tasks. 
Indeed,  this  approach  seeined  necessitated  by  the  fact  that  higher  educa 
tion  organizes  its  curriculum  offerin^;s  into  units,  or  courses,  not 
specifically  oriented  to  military  requirements  (or,  for  that  matter,  to 
the  specific  requirements  of  civilian  jobs).    Even  disregarding  this 
constraint,  we  would  still  have  faced,  under  the  more  traditional  ap- 
proach, the  problem  of  accounting  for  the  multitudinous  tasks  involved 
in  all  of  the  Coast  Guard's  officer  billet  codes. 

lined  four  major  goals,  or  phases,  for  the  study: 

1.'   Phase  1  of  the  study  involved  the  development  of  a  survey  ques- 
tionnaire to  provide  information  concerning  officer  billet  requirements 
and  resources  in  relation  to  the  Coast  Guard's  postgraduate/post- 
commission  education  and  training  program  (hereafter  referred  to  as  the 
FGC  program).    This  questionnaire  was  designed  to  obtain  respondents' 
ratings  of \a)  their  billets'  requirements  for  specified  PGC  courses  and 
(b)  their  own  competencies  in  relation  to  the  same  courses.    In  addi- 
tion, the  questionnaire  sought  certain  biographical  information,  as  well 
as  information  pertaining  to  the  respondents'  attitudes  and  opinions 
about  various  aspects  of  the  PGC  program. 

2.  Phase  2  involved  the  collection  of  questionnaire  responses  from 
a  large,  representative  sample  of  Coast  Guard  officers  and  warrant  offi- 
cers. 

3.  Phase  3  consisted  of  descriptive  statistical  analyses  of  the 
questionnaire  response  data. 


4,    And  Phase  4  called  for  an  initial  comparative  analysis  of  edu- 
cational and  training  requirements  versus  human  resources  in  the  Coast 
Guard's  officer  billet  codes.    This  analysis  involved  comparisons  be- 
tween the  respondents*  billet  and  self  ratings  on  specified  educational 
and  training  courses. 

Instrument  Development 

The  data-gathering  instrument  in  this  study  was  titled  the  "Survey 
of  Officer  Billet  Educational  Requirements"  (or  SOBER) .    This  question- 
naire was  divided  into  four  main  sections* 
Section  I;    Biographical  Information 

Section  I,  titled  "Information  About  You,"  ma  designed  to  provide 
background  information  on  such  factors  as  the  respondent's  current  grade 
level,  authorized  grade  of  billet,  specialty  area,  previous  training  and 
education,  present  educational  activities,  and  educational  plans.  The 
30  response  items  in  this  section  were  selected  based  on  their  potential 
usefulness  in  organizing  and  understanding  the  data  obtained  in  the  re- 
mainder of  the  questionnaire. 

Sections  II  and  III:    Educational  Requirements 
and  Proficiencies 

Sections  II  and  III  of  the  SOBER  were  designed  to  obtain  informa- 
tion on  (a)  billet  educational  and  training  requirements  and  (b)  officer 
knowledges  and  skills  in  relation  to  those  requirements.    Section  II  in- 
structed the  respondents  to  rate  the  requirements  of  their  particular 
billets  for  the  knowledges  and  skills  represented  J  a  681  course  descrip- 
tions; Section  III  asked  them  to  rate  their  own  proficiencies  in  terms 
of  the  same  courses.    These  681  courses  were  selected  from  an  original 
pool  of  over  5400  that  were  identified  as  potentially  relevant  to  the 
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Coast  Guard  PGC  program  areas.    The  selections  were  based  on  program 
managers'  and  representatives*  estimates  of  the  Impoi'tance  of  the  varl* 
ous  courses  to  the  billets  associated  with  their  program  areas. 

The  course  descriptions  coEq>rlslng  the  Items  for  Sections  II  and 
III  were  prepared  by  consultants  at  the  various  program-of faring  Instl* 
tutlons.    These  consultants  were  Instructed  to  divide  each  course  Into 
Its  major  knowledge  units  (or  topics)  and  to  write  a  brief  descriptive 
statement  of  each  unit's  content.    In  con^oslte,  the  knowledge *unlt 
statements  comprised  the  course  description.    Two  examples  of  these 
course*descrlptlon  Items  are  shown  In  Figure  1.    The  681  course  Items 


Insert  Figure  1  here 


were  arranged  under  seven  major  subject-field  designations  which.  In 
turn,  were  subdivided  Into  a  total  of  25  more  specific  subject  categor- 
ies (see  Figure  2) . 


Insert  Figure  2  here 


The  respondent  used  a  seven-point  level -of -knowledge-required  scale 
to  rate  his  billet  on  the  course  Items,  and  a  corresponding  seven-point 
scale  to  TC^te  his  own  levels  of  knowledge  relative  to  the  same  courses 
(see  Figure  3).    As  shown,  there  Is  a  point -for-polnt  correspondence 


Insert  Figure  3  here 
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between  the  two  scales.    Billet  ratings  on  all  681  courses  were  per- 
formed first,  followed  by  self  ratings  on  the  same  courses. 

Section  IV;    Opinions  About  the  PGC 
Program 

Section  IV,  the  last  part  of  the  SOBER  questionnaire,  asked  the 
respondents  for  their  personal  opinions  concerning  various  aspects  of 
the  PGC  program.    The  questions  in  this  section  dealt  with  such  topics 
as  the  respondent's  personal  goals  in  relation  to  the  PGC  program,  the 
adequacy  of  certain  program  characteristics,  and  the  acceptability  of 
some  possible  program  alternatives.    Seven-point  scales  were  used  with 
42  of  the  57  items  comprising  this  section,  while  the  remaining  items 
used  scales  containing  two  to  six  points,  depending  upon  the  question. 
Figure  4  shows  two  examples  of  these  scales. 


Insert  Figure  4  here 


Procedures  and  Results 
The  SOBER  questionnaire  was  mailed  to  over  5,600  Coast  Guard  offi- 
cers and  warrant  officers.    Each  officer  received  a  package  containing 
(a)  the  questionnaire,  (b)  a  set  of  answer  sheets,  and  (c)  a  franked 
return  envelope  pre-addressed  to  the  North  Carolina  State  University 
Center  for  Occupational  Education.    The  officers  were  assured  anonymity 
in  their  responses.    Of  the  questionnaire  returns  received  by  the  cut-off 
date,  a  total  of  2,866  (51  percent)  contained  usable  data.    The  numbers 
and  percentages  of  usable  returns  by  grade  level  are  shown  in  Table  1. 


Insert  Table  1  here 


Billet  Requirements 

Descriptive  statistical  analyses  were  performed  on  the  billet 
knovledge*requlrement  ratings  within  each  of  the  seven  grade  levels. 
Table  2  shows  some  results  for  the  10  subject  areas  that  were  most  fre* 
quently  required.    The  cell  entries  represent  the  numbers  and  propor- 
tions of  courses  In  each  subject  area  that  were  required  In  each  of  the 
seven  grade  levels. 


As  you  can  see  In  the  bottom  line  of  this  table,  the  number  of  re- 
quired courses  Increases  monotonlcally  with  grade  level.    For  example, 
18  courses  were  required  by  warrant  officers ^  38  by  lieutenants,  71  by  ^ 
commanders,  and  189  by  admirals.    Language  skills  and  personnel/manpower/ 
psychology  were  the  two  most  generally  required  subjects  across  grade 
levels.    The  data  for  these  two  areas  suggest  that  all  ranks  require  a 
core  of  knowledges  and  skills  in  communication,  management,  and  human 
relations.    Billets  In  the  higher  grade  levels  appear  to  require  an 
elaboration  of  these  knowledges  and  skills,  as  evidenced  by  the  Increased 
number  of  such  courses  as  a  function  of  rank. 

Fm  the  most  part,  courses  required  at  one  grade  level  are  also 
required  at  the  higher  levels,  suggesting  a  progression  of  knowledge  re- 
quirement's as  a  function  of  rank.    It  appears,  moreover,  that  the  addi- 
tional course  requirements  at  successively  higher  ranks  represent  not 
just  an  elaboration  of  the  core  subject  areas,  but  also  the  Introduction  of 
new  areas.    Quits  evident,  for  example,  are  requirements  in  the  higher  | 
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ranks  for  courses  in  business  management,  law,  and  political  science/ 
government— subject  areas  for  which  there  is  relatively  little  require- 
ment in  the  lower  grade  levels.    In  more  general  terms,  an  examination 
of  cumulative  course  requirements  with  increasing  grade  level  shows 
course  acquisitions  in  four  additional  subject  areas  between  the  warrant 
officer  and  ensign/lieutenant,  junior  grade  levels,  three  additional 
areas  between  lieutenants  and  lieutenant  commanders,  one  additional  area 
between  commanders  and  captains,  and  six  additional  areas  between  cap- 
tains and  admirals.    In  contrast  to  the  five  areas  of  course  require- 
ments for  warrant  officers,  admirals  reported  course  requirements  in  19 
different  subject  areas.    The  unique  configuration  of  courses  at  the 
admiralty  level  is  assumed  to  reflect  the  broad-based  responsibility  for 
decision-making  in  all  areas  of  Coast  Guard  activities.    This  inference 
is  supported  by  the  fact  that  the  admirals  report  course  requirements  in 
such  areas  as  business  management,  accounting/finance,  economics,  poli- 
tical science/government,  law,  and  operations  research.    These  courses 
are  decision-theoretic  and  can  be  argued  to  reflect  the  decision-making 
requirements  inherent  in  their  billets. 

At  this  point,  I  should  introduce  a  qualifying  note  in  connection 
with  these  data.    As  you  know,  billet  characteristics  and  requirements 
vary  within  grade  levels,  and  this  variation  is  likely  to  be  quite  sub- 
stantial.   When  the  billet  ratings  are  averaged  within  grade  levels, 
many  of  the  specific  billet  requirements  are  masked.    This  would  be  par- 
ticularly true  of  the  more  technologically  specific  requirements.  Thus, 
it  is  important  that  requirements  also  be  analyzed  by  specialty  area, 
by  officer  billet  code,  and  by  billet.    Coast  Guard  Headquarters  is,  in 
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fact,  currently  carrying  out  such  analyses  under  the  direction  of  Messrs. 
Joseph  Cowan  and  Richard  Lanterman.    To  date,  they  have  examined  selected 
OBC's  and  performed  a  cluster  analysis  of  over  2,000  billets.    It  is 
worth  noting  that  in  addition  to  identifying  a  number  of  OBC-  and 
cluster-specific  requirements,  their  analyses  support  our  previous  find- 
ings in  regard  to  core  course  requirements;  that  is,  the  language-skill 
and  management -related  requirements  appear  to  be  general  across  billet 
clusters  and  specialty  areas  as  well  as  grade  levels. 

Billet  Requirements  Compared  with  Incumbent 
Knowledges 

As  mentioned,  the  SOBER  questionnaire  respondents  also  rated  their 
own  levels  of  knowledge  in  relation  to  the  681  course  items.  These 
individual  self  ratings  were  averaged  within  each  of  some  522  officer 
billet  codes,  yielding  a  mean  "knowledge-resource"  vector  for  each  OBC. 
Individual  billet-requirement  ratings  were  also  averaged  within  the  522 
billet  categories,  producing  a  "knowledge-requirement"  vector  for  each 
OBC.    For  each  OBC,  the  knowledge- resource  and  knowledge-requirement 
vectors  were  then  compared  by  means  of  a  "requirement-resource  dis- 
parity index"  (or  RRDI) .    This  index  represented  the  average  resource 
deficiency  per  course,  for  those  cases  where  the  requirement  estimate 
exceeded  the  resource  estimate;  that  is,  the  average  deficiency  among 
those  courses  for  which  deficiencies  were  found  within  the  particular 
OBC  (see  Figure  5) . 


Insert  Figure  5  here 
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In  general,  both  the  RRDI  value  and  the  absolute  number  of  course 
deficiencies  tended  to  increase  with  grade  level;  and,  consistent  with 
our  earlier  findings  (see  Table  2) ,  the  course  deficiencies  tended  to 
concentrate  in  the  areas  of  language  skills,  personnel/manpower/psy- 
chology,  and  business  management.    These  results  again  point  to  the 
increasing  importance  of  certain  core  knowledges  and  skills  as  a  func- 
tion of  rank.    As  noted  in  connection  with  the  billet-requirement  ratings, 
however,  the  results  must  also  be  examined  by  specialty  area,  by  officer 
billet  code,  and  by  billet.    Such  analyses  are  currently  underway  at 

Coast  Guard  Headquarters. 

Although  our  efforts  in  this  area  were  somewhat  exploratory,  we 

believe  that  the  requirement -resource  disparity  approach  should  have 

some  potential  use  in  assessing  the  educational  and  training  needs  of 

billets  and  billet  clusters  and,  possibly,  in  assigning  individuals  to 

PGC  training  slots. 

Opinions  Concerning  the  PGC  Program 

The  final  set  of  analyses  in  this  study  were  performed  on  the  officers' 

responses  to  the  questions  about  their  opinions  concerning  the  PGC  program. 

The  results  are  summarized  in  Figure  6. 


Insert  Figure  6  here 

Among  those  respondents  who  had  received  PGC  training,  87  percent 
felt  that  It  had  effected  a  moderate  to  great  increase  in  their  general 
performance.    The  percentage  expressing  this  opinion  ranged  from  82  percent 
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for- warrant  officers  to  100  percent  for  admirals.  Thus,  at  all  grade 
levels,  there  appears  to  be  a  considerable  perceived  benefit  from  PGC 
training. 

Only  30  percent  of  the  respondents  thought  that  their  billets 
required  a  graduate  degree.    As  shown  in  Figure  6,  the  percentage  in- 
dicating a  graduate  degree  requirement  increased  monotonically  with  grade 
level,  and  ranged  from  7  percent  for  warrant  officers  to  66  percent  for 
admirals.    Professional  and  managerial  requirements  were  the  most  frequent 
Indicated  reasons  why  a  graduate  degree  was  necessary  in  a  billet. 

The  officers'  most  Important  educational  goals  were  to  improve  their 
technical  specialty  skills  and  managerial  capabilities,  while  the  least 
Important  personal  considerations  were  qualifying  for  licensing  and  in- 
creasing enq>loyability  in  civilian  life.    Their  most  important  personal 
reasons  for  seeking  PGC  training  were  to  extend  their  general,  knowledge  an 
to  a  lesser  extent,  to  enhance  their  promotional  opportunities;  however, 
professional  licensing  and  prestige  were  unimportant  considerations. 

Among  the  various  PGC  program  changes  rated  by  the  respondents,  the 
most  acceptable  were  (a)  systematic  evaluation  of  the  schools  and  courses 
in  the  program,  (b)  periodic  reviews  of  billet  training  requirements, 
(c)  greater  use  of  training  facilities  within  commuting  distance  of  the 
officer's  permanent  duty  station,  and  (d)  increased  emphasis  on  management 
training  Ca  preference  congruent  with  the  results  of  the  billet- 
requirement  analyses).    On  the  other  hand,  the  least  acceptable  PGC 
program  changes  included  (a)  the  development  of  a  Coast  Guard  postgraduate 
school  as  an  alternative  to  civilian  academic  institutions,  (b)  a  shorter 


postgraduate  program  supplemented  with  off-duty  training,  and  (c) 
the  clvillanlzatlon  of  billets  requiring  scarce  or  unusual  technical 
skills* 

It  would  thus  appear  that  the  officers  see  the  present  PGC  program 
as  a  hieans  of  enhancing  their  personal  growth.  Improving  their  performance 
potential,  and  facilitating  their  advancement  within  the  service. 
Although  they  favor  a  greater  program  evaluation  effort  and  possibly 
some  changes  in  program  emphasis  and  site  location,  they  do  not  seem  to 
be  seeking  drastic  changes  in  program  philosophy  and  practice. 

Some  Initial  Conclusions 
Several  initial  conclusions  were  drawn  based  on  these  preliminary  analyses. 
These  conclusions  are  presented  as  tentative  recommendations  and  are  meant 
to  be  suggestive  of  the  potential  policy  implications  of  the  data. 

The  first  conclusion  is  that  all  officers  should  be  adequately  trained 
in  the  core  knowledge  areas.    The  results  of  this  study  indicate  that  six 
language-skill  courses,  eight  courses  in  personnel/manpower /psychology, 
and  one  business  management  course  are  judged  to  be  required  at  all  grade 
levels.    All  ranks  from  ensign  up,  excluding  warrant  officers,  were 
judged  to  require  nine  language-skill  courses,  11  personnel/manpower/ 
psychology  r^oursas,  four  law  courses,  one  math  course,  and  three  military 
short  courses.    These  common  requirements  would  seem  to  have  implications 
for  both  pre-commisslon  and  post-commission  training  activities. 

A  second  possible  conclusion  is  that  training  opportunities  should 
be  provided  at  each  grade  level.    The  progression  of  knowledge  re- 
quirements across  grade  levels  argues  well  for  specific  training  content 
oriented  to  rank.    At  each  grade  level  a  set  of  courses  can  be  identified 
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such  that  If  a  course  is  judged  to  be  required  at  that  level.  It  will  also, 
tend  to  be  required  at  each  succeedingly  higher  level.    Under  a  rank-specific 
training  approach,  training  for  knowledges  not  used  at  a  particular  rank 
might  be  deferred  until  that  time  at  which  the  knowledges  become  Important, 

Related  to  our  second  conclusion  Is  a  third  conclusion  that  rank- 
specific  training  content  should  be  supplemented  with  billet-  or  OBC- 
specific  training  content.    As  mentioned  earlier,  the  characteristics  and 
requirements  of  the  billets  within  a  particular  grade  level  may  show  con- 
siderable variation  around  the  means  for  that  level.    Accordingly it 
becomes  necessary  to  take  into  consideration  the  unique  requirements  imposed 
by  individual  billets,  OBC's  or  billet  clusters • 

Our  fourth  conclusion  is  that  the  Coast  Guard  should  consider  increasing 
the  incidents  of  training  opportunities  provided  each  officer.  The 
progression  in  the  kind  and  quantity  of  knowledge  requirements  across 
ranks  has  already  been  mentioned.    Knowledge  requirements  apparently 
shift  from  more  technically  and  specifically  oriented  requirements  at  the 
lower  ranks  to  the  more  people-  and  policy-oriented  knowledges  at  the  higher 
ranks.    In  order  for  the  PGC  program  to  be  responsive  to  changing  demands, 
training  content  must  shift  as  a  function  of  this  demand.    Unless  we 
assume  that  training  given  at  any  stage  in  career  progression  will  generalize 
to  subsequent  stages  and  will  provide  for  all  future  knowledge  demands, 
changing  demand  structure  would  appear  to  require  training  at  successive 
points  in  officers'  careers  to  prepare  them  for  subsequent  changes  in 
performance  requirements. 
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Our  last  and  most  obvious  conclusion  is  that  continued  use  should 
be  made  of  the  data  base  obtained  in  this  study  as  a  means  of  developing 
strategies  to  improve  the  match  between  training  requirements 
and  resources.    The  data  obtained  in  this  study  represent  a  rich  source 
of  information  that  can  be  used  to  make  informed  decisions  about  the 

 "         " — >-•"=  •-'.aj.iij.iig  jLequxremencs  among  v^oasc  uuard  warrant 

and  conmissioned  officer  billets.    As  noted,  however,  the  analysis 
performed  in  this  study  were  of  necessity  primarily  descriptive  and 
limited  in  scope.    A  number  and  variety  of  additional  analyses  are  needed 
to  provide  insight  into  the  knowledge-requirement  structure  of  these 
billets.    Several  such  analyses  are  presently  being  conducted  at  Coast 
Guard  Headquarters,  and  others  are  planned. 
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Figure  1.    Examples  of  course-description  items 


422.    ORGANIZATION  AND  MANAGEMENT 

Introduction  to  Management— Rnl p  in  modern  society,  the  busi- 
ness organization  as  a  system,  management  as  a  process,  management 
in  a  changing  environment. 

Managerial  Planning— Establ^ah^ng  objectives,  formulating 
policy  and  operating  plans,  decision-making,  organizational  struc- 
ture and  relationships,  delegation  and  decentralization,  line  and 
staff  relationships,  organization  planning  and  change. 

Social  Aspects  of  Organizing— Organ ^>!»^^nn  as  a  social  system, 
cultural  background  of  organization,  status  systems,  organization 
and  the  individual,  staffing  the  organization. 

Direction  of  the  Organization— The  employee  as  a  person,  lead- 
ership and  motivation,  communication,  employee  attitudes. 

Controlling  Organizational  Performance— Basic  factors  in  con- 
trol, systems  approach  to  managerial  control,  dysfunctional  conse- 
quences of  control.  Improving  effectiveness  of  control,  use  of 
feedback  in  control. 


571.    FUNDAMENTALS  OF  WRITING 

Review  of  English  Grammar— Parts  of  speech,  sentence  struc- 
ture, proper  usage,  punctuation. 

Subject  Matter  of  a  Composition— PnrpnaP ,  choosing  and  limit- 
ing a  subject,  selecting  the  major  thesis,  deciding  what  to  way. 

Organization— Baafc  principles  of  organization:    making  and 
refining  the  outline,  introduction,  ordering  the  parts  of  a  compo- 
sition, climax,  conclusion. 

ParaRraphs— The  paragraph  as  a  single  idea,  paragraph  organi- 
zation and  functions,  topic  sentences. 

Writing  Practlce--Uae  of  the  fundamental  principles  of  writing 
in  composition  of  a  variety  of  themes. 
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Figure  2.    Outline  of  course-description  Itenois 


ENGINEERI1«; 

Bloenglneerlng/Envlronmental  Engineering 
Chemical  Engineering 

Civil/Construct lon/Transportatloa  Engineering 
Electrical/Electronics/Communications  Engineering 
Industrial  and  Management  Engineering 
Mechanical  Engineering 
Metallurgical/Materials  Engineering 

Naval  Architecture/Marine  Engineering/Ocean  Engineering 
Engineering  Mechanics 
Engineering  Physics 

MATHEMATICS/STATISTICS 

Mathematics 
Statistics 

INFORMATION  TECHNOLOGIES 

Computer  and  Information  Sciences 
Operations  Research 

BUSINESS  /MANAGEMENT/ADMINISTRATION 

Accounting/Finance 
Business  Management 
Economicj|; 

Personnel/Manpower/Psychology 

PHYSICAL  SCIENCES 

Physics 
Chemistry 

Other  Physical  Sciences 

ARTS  AND  LETTERS 

Language  Skills 
Literature/Philosophy 
History/Political  Science/Government 
Law 

INDUSTRY  TRAINING  PROGRAMS  AND  SELECTED  SHORT  COURSES 
Industry  Training  Programs  and  Selected  Short  Courses 
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Figure  3.    Response  scales  used  for  the  billet- 
r   [uirement  and  self  ratings 

Level  of  Knowledge  Required  by  the  Billet 

«  No  knowledge  in  this  area  is  required  by  the  billet. 

=  Little  knowledge  in  this  area  is  required  by  the  billet. 

»  Some  knowledge  in  this  area  is  required  by  the  billet. 

«  Moderate  knowledge  in  this  area  is  required  by  the  billet. 

More  than  aoderfite  knowledge  in  this  area  ts  required  bv 
the  billet.  ^ 

Substantial  knowledge  in  this  area  is  required  by  the  billet. 

Almost  complete  mastery  in  this  area  of  knowledge  is  required 
by  the  billet.  ^ 

Level-of-Knowledge  Scale 
I  have  no  knowledge  in  this  area. 
I  have  little  knowledge  in  this  area. 
I  have  some  knowledge  in  this  area. 
I  have  moderate  knowledge  in  this  area. 
I  havfc  more  than  moderate  knowledge  in  this  area. 
I  have  substantial  knowledge  in  this  area. 
I  have  almost  complete  mastery  in  this  area. 
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Figure  4.    Examples  of  the  response  scales  used  with 
the  attitude  and  opinion  Items 


Section  B 

How  iflsportant  are  (or  vere)  the  folic ving  raascns  to  you  in  desiriag 
postgri.duate/advanced  training?    Use  the  following  scale: 

^cale:  Blank  «  No  opinion 

1  s  No  Importance 

2  a  Significantly  below  average  Importance 

3  »  Somevrfiat  below  average  Importance 

4  «  Average  Importance 

5  a  Somewhat  above  average  Importance 

6  Significantly  above  average  Importance 

7  =»  Critical  importance 


Section  C 

How  acceptable  do  you  find  the  following  alternatives  to  the  present 
postgraduate/advanced  training  program?    Use  the  following  scale: 

Scale:  Blank  »  No  opinion 

1  s  Totally  unacceptable 

2  -  Moderately  unacceptable 

3  s  Slightly  unacceptable 

4  -  Makes  no  difference 

5  a  Slightly  acceptable 

6  s  Moderately  acceptable 

7  «  Very  acceptable 
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Figure  5.    The  requirement-resource  disparity  index  (RRDI) 


The  knowledge-requirement  and  knowledge-resource  vectors  for  each 
OBC  number  provided  a  basis  for  estimating  the  disparity  between  (a)  the 
OBC's  educational  and  training  requirements  and  (b)  the  human  resources 
in  the  OBC.    This  disparity  estiisate,  termed  the  "Requlreiaaat -Resource 
Disparity  Index"  (RRDI),  was  computed  for  each  OBC  number  as  follows: 

a.    Each  mean  in  the  resource  vector  for  a  specified  OBC  number 
was  subtracted  from  the  corresponding  mean  in  that  OBC's 
requirement  vector. 

Requirement  Resource  Difference 

vector  Vector  (d) 

\  r|  ri  -  r; 

*2  ^2  ^2  "  ^2 

R3  R3  R3  -  R^ 


^681 


'681 


'681 


'681 
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b.  All  positive  differences  between  means  (+d)  were  retained; 
all  negative  differences  between  means  (-d)  were  discarded. 

c.  The  positive  differences  between  the  means  in  the  two  vectors 
were  summed. 

S(+d) 

d.  ,  The  RRDI  value  was  obtained  by  dividing  the  sum  of  the  positive 

differences  by  the  number  of  positive  differences. 

RRDI  =^2i±dl_ 
k 

where  2(+d)  =  the  sum  of  the  positive  differences  and 
k     =  the  number  of  positive  differences. 

The  resultant  RRDI  value  represents  the  average  difference  between 
an  OBC's  requirement  and  resource  estimates  per  knowledge  (or  course) 
item,  for  those  cases  where  the.  requirement  estimate  exceeds  the  resource 
estimate.    The  k  value,  repr^tf^nting  the  number  of  items  for  which  the 
requirement  exceeds  the  resource,  should  also  be  of  interest,  as  well  as 
the  36(+d)  value,  representing  the  total  estimated  short-fall  in  knowl- 
edge resources.    All  three  of  these  values  should  be  considered  In  assess- 
ing the  extent  of  the  educational  and  training  need  for  a  particular  OBC, 
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Figure  6.    Suom&ry  of  thm  reipondents'  opinions 
about  tho  FGC  program 

What  effect  has  PGC  training  had  on  your  general  performance? 
Moderate  to  great  increase:  87% 

Does  the  billet  you  rated  require  a  graduate  degree? 

Yes:    30%         No:  70% 

Ensign/  Lt. 
WO       Lt,  JG        Lt,       Cmdr.       Cmdr,       Capt>  Admiral 
7^  24%  30%        37%  50%  "^^7 

Personal  educational  goals? 

Most  Important 

Improve  technical  specialty  skills. 
Improve  managerial  skills. 

Least  important 

Develop  competencies  for  licensing. 
Increase  employability  in  civilian  life. 

Personal  reasons  for  FGC  training? 

Most  Important 

Expand  general  knowledge  base. 
Enhance  promotional  opportunities. 

Least  Important 

Prepare  for  professional  licensing. 
Increase  social  acceptance  and  prestige^ 

Acceptability  of  various  changes  in  the  PGC  program? 

Most  Acceptable 

Evaliiation  of  FGC  schools  and  courses. 
Periodic  review  of  billet  training  requirements. 
Greater  use  of  facilities  within  commuting  distance. 

Least  Acceptable 

Development  of  a  Coast  Guard  PG  school  as  an  alternative  to 

civilian  institutions. 
Shorter  PG  programs. 
Civilianization  of  certain  billets. 
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Table  1.    Numbers  and  percentages  of 
returns  by  grade  level 

usable  SOBER 

Grade  Level 

Number 

Percentaee 

1.    Warrant  Officers  (WO-1,  WO-2,  WO-3,  WO-4) 

560 

19.5 

2.    Ensigns  and  Lieutenants,  Junior  Grade 

743 

25.9 

3.  Lieutenants 

670 

23.4 

4.    Lieutenant  Commanders 

430 

15.0 

3.  Commanders 

295 

10.3 

6.  Captains 

153 

5.3 

7.  Admirals 

15 

0.5 

TOTAL 

2866 

99.9 

G8 
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Table  2. 


KlfBERS  AND  PROPORTIONS  OF  COURSES  FROM  ID  SELECTED  SUBJECT  ;«EAS  THAT 
ERE  REQUIRED  IN  EACH  OF  SEVEN  GRADE  LEVELS. 


Course  Areas     WO      Ensign/     Lt.    ^  Lr.     Cwdr.      Capt.  Admiral 

LtJG  Cmdr. 


1.  Indus.  &  0_ 
Mgt.  .00 
Engineering 


2.  Math  0_ 

.00 

3.  Accounting/  0_ 
Finance  .00 

4.  Bus.  Mgt.  1 

.0^ 

5.  EcoN.  0_ 

.00 

6.  Pers./Man-  8_ 
fwr/Psy.  .35 

7.  Lang.  Skills  6^ 

.67 

8.  History/.  0_ 
Pol.  Sci./  .00 
Gov. 

9.  Law  0_ 

.00 


ID.  Jndust.  0 

^SES 


Trng.  &  .00 
Short  Cours 


ALL  NOS  18^, 
,03 


.00 

0 
u 

.00 

f) 

1 

"so 

750 

mm 

.03 

V 

?00 

0 

.00 

!o8 

3 

.•25 

u 

.92 

lis 

7 

.26 

13 
.i|8 

35 
.56 

23 
.85 

n 

Too 

n 

Too 

0 

.06 

^08 

Zi 

;ii 

17 

.18 

13^, 
.56 

^83 

21 
.91 

22 
.96 

1.00  0 

1.00 

9 

T  Art 

1.00 

9 

T  rt/> 

1.00 

1.00 

9 

1  Art 

1.00 

*r  rtrt 

1.00 

8^ 
.31 

^05 

^05 

96 
.1^1 

ERIC 


50 


EVALUATING  THE  ARMY  OCCUPATIONAL  SURVEY  PROGRAM  METHODOLOGY: 
ANSWER  BOOKLETS,  QUESTIONNAIRE  LENGTH,  AND  POPULATION  COVERAGE 


Eugene  M.  Bums 


US  Army  Military  Personnel  Center 
200  Stovall  Street 
Alexandria,  Virginia  22332 


THE  VIEWS,  OPINIONS  AND/OR  FINDINGS 
CONTAINED  IN  THIS  REPORT  ARE  THOSE 
OF  THE  AUTHOR  AND  SHOULD  NOT  BE 
CONSTRUED  AS  AN  OFFICIAL  DEPARTMENT 
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Evaluating  the  Army  Occupational  Survey  Program  Methodology: 
Answer  Booklets,  Questionnaire  Lengthy  and  Population  Coverage 

Eugene  M.  Bums 
US  Army  Military  Personnel  Center 

« 

In  an  on-going  survey  program,  such  as  the  Army  Occupational  Survey 
Program  (AOSP),  there  exists  the  opportunity,  as  well  as  the  need,  to 
monitor  and  evaluate  the  survey  methodology.    Periodic  evaluation  efforts 
enable  the  survey  managers  to  learn  systematically  from  their  survey  ex- 
perience and  to  Improve  and  refine  the  survey  procedures.    On  the  basis 
of  these  evaluations,  survey  managers  can  modify  their  procedures  to  re- 
duce cost.  Increase  efficiency,  or  improve  data  quality.    For  example,  the 
Bureau  of  the  Census  extensively  evaluates  its  monthly  Current  Population 
Survey  (U.S.  Bureau  of  the  Census,  1978).    As  an  example  of  how  an  evalua- 
tion can  be  accomplished  in  a  military  survey  program,  this  paper  will 
discuss  an  experiment  currently  being  conducted  to  evaluate  various 
aspects  of  the  AOSP  survey  methodology. 
BACKGROUND 

As  of  early  1978,  the  AOSP  was  programmed  to  survey  about  100  Army 
enlisted  Military  Occupational  Specialties  (MOS)  per  year.    The  main 
portion  of  each  survey  was  an  MOS  task  inventory,  but  there  were  also 
sections  covering  background  information,  equipment,  special  requirements, 
and  job  satisfaction..    The  MOS  with  less  than  1000  members  (about  two- 
thirds  of  the  MOS)  were  surveyed  in  their  entirety  while  the  remaining  MOS 
were  sampled.     (More  detailed  discussions  of  the  AOSP  are  to  be  found  in 


52 


(U.S.  Department  of  the  Army,  1977)).  In  early  1978,  three  aspecta  of  the 
AOSP  methodology  seemed  to  be  particularly  in  need  of  study: 

1.  The  AOSP  answer  booklets.    Prior  to  January  1978,  AOSP  (then  known 
as  MODB — The  Military  Occupational  Data  Bank)  had  used  a  single  survey 
booklet.    Responses  were  recorded  In  the  booklet  next  to  the  questions.  ' 
Starting  In  January  1978,  separate  question  and  answer  booklets  were  In- 
troduced for  economy  reasons.    The  separate  booklets  were  expected  to  be 
more  difficult  to  use  and,  therefore,  to  yield  less  reliable  data  than  the 
self-contained  booklets,  but  the  extent  of  this  difference  needed  to  be 
assessed. 

2.  Questionnaire  length.    Coinciding  with  the  January  1978  answer 
booklet  change,  a  124  Item  job  satisfaction  section  was  added  to  the  ques- 
tionnaire.^   Increasing  the  length  of  the  questionnaire  was  also  expected 
to  have  a  deleterious  effect  on  the  quality  of  responses,  especially 
towards  the  end  of  the  questionnaire,  where  the  job  satisfaction  section 
was  located.    Respondents  might  be  too  fatigued  to  give  reliable  responses 
to  a  sec  :lon  tacked  on  to  the  end  of.  an  alretidy  lengthy  MOS  questionnaire. 
Research  was  needed  to  determine  whether  the  overall  quality  of  responses 
to  the  questionnaire  was  affected  by  the  addition  of  the  job  satisfaction 
section  and,  in  particular,  whether  the  job  satisfaction  section  should 

be  kept  as  part  of  the  AOS?  questionnaire. 


This  section  was  copied  from  the  November  1977  survey  of  Job  and 
Career  Satisfaction  so  that  individual  MOS  could  be  analyzed  against 
an  Army-wide  baseline. 
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3.    Population  coverage.    Where  saavling  was  required,  AOSP  surveys 
had  relied  on  quota  sampling.    AOSP  Project  Officers  at  the  installation 
level  were  mailed  a  number  of  questionnaires  in  proportion  to  the  number 
of  MOS  incumbents  assigned  to  their  installation.    The  Project  Officers 
were  Instructed  to  distribute  the  questionnaires  to  "personnel  from  as 
many  different  grades  and  duty  positions  as  possible"  (U.S.  Department 
of  the  Army,  n.d. :  para  2-2).    At  issue  was  whether  a  shift  to  statis- 
tically more  sound  random  sampling  would  be  worth  the  effort  involved 
in  revamping  the  established  distribution  system,  which  was  geared  towards 
the  operationally  simpler  quota  sampling.    The  answer  would  depend,  in 
large  part,  on  a  determination  of  the  established  system's  effectiveness 
in  attaining  broad  population  coverage. 
STUDY  DESIGN 

The  experimental  design  shown  in  Figure  1  was  proposed  to  investigate 
the  effects  of  different  answer  booklets  and  of  questionnaire  length.  By 
sending  out  the  same  questionnaire  in  two  different  formats  (self- 
contained  and  separate  answer  booklets),  the  relative  reliabilities  of  the 
two  methods  of  recording  answers  could  be  determined.    Similarly,  by  com- 
paring questionnaires  sent  out  with  instructions  to  omit  either  the  job 
satisfaction  or  the  MOS-related  sections  with  questionnaires  which  were 
fully  completed,  the  effect  on  survey  quality  of  the  additional  job  satis- 
faction section  could  be  estimated.    Two  types  of  comparisons  to  be  made: 
(1)  between  individuals  at  the  same  point  in  time  (e.g.,  between  groups 
1-2  and  3-4  at  the  first  administration),  and  (2)  within  the  same  individuals 
at  two  different  points  in  time  (e.g.,  group  1  at  the  first  and  second  ad- 
ministrations).   The  design  in  Figure  1  strengthens  our  ability  to  infer 
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Figure  1.    Design  for  a  Study  of  the  Effects  of  Army  Occupational  Survey 
Program  Answer  Booklet  Formats  and  Questionnaire  Length  on 
the  Reliability  of  the  Survey  Data 


Study 
Group 

First  Administration 

Second  Administration 

1 

oep&race  iuiswer  JoooRlet 

Separate  Answer  Booklet 

2 

oeparace  answer  Joooiclet 

Self-contained  Answer  Booklet 

Q 

oeir-contalned  Answer  Booklet 

Self-contained  Answer  Booklet 

/ 
•f 

^eir-contaxned  Answer  Booklet 

Separate  Answer  Booklet 

5 

MOS-related  Only 

MOS-related  Only 

6 

Job  Satisfaction  Only 

Job  Satisfaction  Only 

7 

Separate  Answer  Booklet 

8 

Self-contained  Answer  Booklet 

9 

MOS-related  Only 

10 

Job  Satisfaction  Only 

that  observed  differences  are  due  to  the  experimental  manipulation  («.g. , 
answer  booklet  format)  and  not  to  other  factors.    Other  factors  could 
Include  (1)  respondent  famlliarizatioD  with  the  questionnaire  or  resistance 
to  a  second  questionnaire  administration,  and  (2)  changes  in  the  work 
performed,  reflecting  either  random  monthly  variation  in  tasks  or  increased 
soldier  skill  and  responsibility.    Groups  7-10  were  included  in  the  design 
to  obtain  estimate^  of  the  amount  of  change  to  be  expected  over  the  course 
of  several  months  among  soldiers  who  had  not  been  exposed  to  the  AOSP 
survey.     (For  further  discussion  of  the  logic  of  experimental  design,  see 
Campbell  and  Stanley,  1966). 
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EXECUTING  THE  STUDY  DESIGM 

Figure  1  describes  a  tightly  controlled  textbook  experimental  design. 
However,  the  design  had  to  be  embedded  within  an  established  survey 
program.    Rather  than  gloss  over  the  decision*  and  compromises  entailed 
by  this  embedding,  they  will  be  described  In  detail  in  this  section  so  * 
that  other  survey  programs  may  benefit  from  the  AOSP  experience. 

How  Many  MPS?    Questionnaires  with  separate  answer  booklets  were  being 
produced  at  the  rate  of  roughly  10  a  month,  but  any  oelf-contained  ques- 
tionnaire would  have  to  be  produced  by  modifying  an  existing,  separate 
answer  booklet,  questionnaire.    Given  the  amount  of  effort  involved  in 
producing  a  high  quality  version  of  the  standard  booklet,  it  was  decided 
to  use  just  one  MOS  for  the  evaluation.    Should  the  findings  from  one  MOS 
prove  ambiguous,  the  study  could  be  expanded  to  more  MOS.    Sending  several 
versions  of  more  than  one  MOS  surr^ey  might  also  unduly  burden  and  confuse 
the  AOSP  Project  Officers. 

Which  MOS?    The  decision  to  base  the  evaluation  on  an  already  existing 
.    questionnaire  limited  the  MOS  to  one  available  in  the  spring  of  1978.  In 
addition,  a  large  MOS  was  called  for  so  that  the  evaluation  would  not 
interfere  with  the  routine  AOSP  data  requirements.    The  type  of  MOS  chosen 
was  not  considered  very  important,  although  an  MOS  of  paperwork  specialists 
would  not  be  suitable  since  these  people  would  be  expect  >d  to  be  more 
attuned  to  forms  and  complicated  instructions  than  the  typical  soldier. 
Taking  all  criteria  into  consideration,  the  MOS  which  best  suited  the 
evaluation  requirements  turned  out  to  be  Motor  Transport  Operator  (6AC). 
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Hov  to  Sample?    As  stated  above,  the  customary  AOSP  sanq)ling  procedure 
was  quota  sampling.    It  was  necessary  to  decide  whether  the  evaluation 
should  rely  on  some  more  rigorous  probability  sampling  scheme  as  called  for 
In  Figure  I's  controlled  experimental  design,  or  whether  it  should  also 
employ  quota  sampling.  since  the  experiment  was  designed  to  learn 

something  about  the  operation  of  the  on-going  survey  program,  it  was 
thought  best  not  to  make  a  major  departure  from  the  standard  samplin?  pro- 
cedures by  insisting  on  a  random  selection  of  respondents  at  the  first 
administration.    If  the  64C  respondents  were  randomly  selected  from  the 
64C  population,  the  64C  survey  would  be  unique.    Therefore,  quota  5.*asipling 
was  used  to  select  first  administration  respondents.    However,  random 
selection  of  respondents  would  be  absolutely  necessary  for  the  second 
administration.    By  randomly  sampling  persons  who  participated  in  the 
first  administration,  the  analysis  results  could  be  generalized  to  that 
population.    The  second  administration  control  groups  were  chosen  after 
the  first  administration.    Respondent  distributions  by  sex,  paygrade,  and 
education  were  compared  vjlth  the  population  distribution,  san?>ling  frac- 
tions were  computed, and  these  fractions  were  used  to  randomly  select 
additional  soldiers  for  the  second  administration. 

Sample  Design.    The  method  for  obtaining  respondents  was  chosen  so  as 
to  place  minimum  strain  on  the  AOSP  distribution  system.    This  could  be 
acv'tomplished  by  minimizing  the  number  of  installations  to  be  affected  by 
the  study,  which  was  done  by  choosing  the  eight  installations  with  the 
largest  64C  populations.    At  each  of  these  installations,  the  regular 
AOSP  quota  was  11  percent,  and  an  additional  11  percent  were 
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chosen  for  the  special  conditions.    Each  installation  chosen  received  all 
four  versions  of  the  64C  questionnaire  (separate  answer  booklets,  self- 
contained  answer  booklets,  MOS-only,  and  job  satisfaction-only).  First 
administration  questionnaires  were  distributed  through  the  normal  AOSP 
distribution  channels.    To  achieve  randomization  of  respondents  among  . 
conditions,  standard  MOS-only,  and  job  satisfaction-only  booklets  were 
Intermixed  in  the  shipping  cartons.    The  self-contained  version  was 
shipped  separately. 

Given  the  use  of  quota  sampling,  there  was  no  firm  basis  for  deter- 
mining the  appropriate  sample  size  needed  for  each  experimental  group. 
As  a  rough  rule  of  thumb,  sample  size  formula  a,ppropriate  for  random 
sampling  was  used  to  obtain  a  number  which  was  then  doubled  to  allow  for 
..attrition  between  administrations.    At  the  95  percent  confidence  level 
(for  a  normal  probability  distribution),  the  sample  size  was  chosen  to 
obtain  a  precision  of  +0.5  on  the  seven  point  scale  used  to  gather  task 
performance  data.    Using  the  equation 


with  s  estimated  as  2.0,  the  sample  size  obtained  was  64  for  each  of  the 
10  study  groups. 

Questionnaire  administrations  were  planned  four  months  apart.  The 
four  month  lag  was  decided  upon  after  debriefing  some  soldiers  after  the 
administration  of  an  earlier  survey. 
RESULTS 

The  evaluation  described  in  this  paper  is  still  in  progress.  The  most 
serious  problem  encountered  thus  far  has  been  in-house  personnel  turbu- 
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lence  which  delayed  the  shipment  of  the  second  administration  question- 
naire by  nearly  three  months.    As  a  result,  retest  plans  for  the 
Instruction  booklets  (MOS  only,  job  satisfaction-only)  were  dropped. 
One  installation  was  unable  to  meet  its  suspense  date  on 

the  first  administration  and  was  dropped  from  the  study.  Otherwise, 
only  minor  problems  were  encountered  on  the  first  administration. 
Table  1  presents  the  first  administration  return  rates  by  Installation 
and  booklet  type. 

The  analysis  of  the  returns  so  far  has  focused  on  the  representative- 
ness of  the  sample  by  comparing  the  distribution  of  returns  (all  booklet 
types)  with  the  64C  population  distribution. 

It  must  be  noted  that  no  such  comparison  can  prove  that  questionnaire 
respondents  were  randomly  sampled.    Random  sampling  is  a  process,  not  a 
result  which  can  be  determined  by  post-hoc  measurement.    However,  the  more 
the  respondent  distribution  approximates  the  population  distribution,  the 
easier  it  becomes  to  argue  that  the  sampling  procedure  is  producing  re- 
sults which  are  representative  of  the  population. 

The  first  question  asked  was  whether  the  respondents  were  distributed 
among  pay  gradoa  proportionate  to  the  64C  population  pay  grade  distribution. 
Of  the  seven  installations,  four  departed  significantly  (at  the  .05  level) 
from  the  distribution  expected  on  the  basis  of  proporticiite  random  sam- 
pling, as  shown  in  Table  2.    These  four  installations  included  some  of  the 
most  conscientioue  and  reliable  AOSP  Project  Officers »    Rather  than  reflect- 
ing unfavorably  upon  the  AOSP  Project  Officers'  conduct  of  their  jobs,  these 
departures  from  the  expected  distribution  should  be  viewed  as  stemming  from 
lack  of  explicit  guidance  calling  for  proportionate  sampling.    Summing  over 
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Table  1.    Aaturn  Rates  by  lascallatlon  and  Booklet  Type» 
First  64C  .Administration 


Booklet  Type 


Installation 


Standard 


Self-Contained 


Special  Instructions 


Sent 

Accepted 

Sent 

Accepted 

Sent 

Accentad 

Fort  A 

43 

32 

20 

18 

20 

20 

Fort  B 

101 

77 

50 

38 

50 

38 

Fort  C 

52 

52 

26 

25 

26 

26 

Fort  D 

149 

1A8 

7A 

70 

74 

66 

Fort  E 

73 

67 

36 

35 

36 

69a 

Fort  F 

51 

51 

2A 

23 

24 

23 

Fort  G 

57 

50 

28 

28 

28 

29a  ( 

Total 

526 

A77 

258 

237 

258 

(90.7%) 

(91. 9Z) 

(105. OZ) 

^  Some  "standard  booklet"  soldiers  were  accldently  given  Instraictions  to 
skip  parts  of  the  questionnaire,  thus  additional  booklets  were  provided. 


Table  2.    A  CoMparisoa  of  the  Actual  64C  Respondent  OUtrlbutlon  with  the  Expected  OUtrlbutlon,  by  Skill  Level  end  Inet&llatloii 


Fort  A  Fort  B  Fort  C  Fort  0  Fort  B  Tort  F  Fort  G  AH  InatalUtloM 

SkiU  Level  Actual    E«pd*  Actual'^  E«pd*  Actual    tod»        Actual    &cDd»     Actual    K«pd*       Act;.- 4    E«p<f       Actual    E«pd*         Actual  EKpd* 

Skill  Level  1 


EI-E2 
E3 
EA 

Skill  Uvel  2 
E5 

Skill  Lavel  3 
E6 

Skill  Lavel  A 
E7 


29  15.87  24  24.32  12  29.26  19  24.85  42  28.38  27  14.39  17  17  72  170  1S8  16 

l5  '-58  33  31.85  23  19.20  57  58.58  34  32.54  23  19  95  25  27  15  I9S  dl'll 

«  27.18  85  85.58  35  35.79  134  U5.37  51  73.87  33  Vl.ll  43  39.11  4W  ^23:27 

17  12.09  2  6  24.32  20  12.86  54  50.  37  31  24.46  U  17.27  16  14.73  175  156.09 


1  3.93 


^        5-84  9         3.91  20       23.74         10        7.83  3         5.56  6        5.98  52  55.95 


'         2.35  6         5.09  5  2.98 


9.09  3        3.92  2         4.6  1  0        2.30  2  5  30.77 


Total  71       71.00         177     177.00         104     104.00  292      292.00       171     171.00  99       99.00         107     107.00        1021  1021.00 

a,i.,uare-        21.53  ^.^l  22.91  3.OO  16.37  16.13  2.99  5.82 


Expected  frequency  based  on  proportionate  saapllng  of  the  installation  64C  population. 
Excludes  23  anonyaous  respondents. 

With  5  degrees  of  freedom,  p^^  -  9.24.  p^^^  -  11.07,  and  p      .  15.09. 
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the  seven  installations  included  in  the  study,  we  see  that  individual 
Installaticm  departures  cancel  each  other  out,  so  that  the  overall  re- 
spondent distribution  is  not  significantly  different  from  the  expected 
distribution.    We  may  speculate  that  this  result  is  not  anomalous,  and 
that  over nil  AOSP  samples  generally  lack  consistent  bias  in  coverage. 

The  second  major  question  asked  involved  the  distribution  of  re- 
spondents  within  pay  grades  El  to  E4.    ITiese  pay  grades  collectively 
comprise  skill  level  one  under  the  new  Enlisted  Personnel  Management 
System  and  include  76  percent  of  the  64C  population  at  the  seven  in- 
stallations.   Within  skill  level  one,  there  are  three  significant  social 
groups:  male  high  school  graduates,  male  non-high  school  graduates,  and 
females  (virtually  all  of  whom  are  high  school  graduates).    Table  3  pre- 
sents the  results  of  a  comparison  of  the  actual  respondent  distribution 
wl^h  the  expected  distribution  for  the  seven  installations.    In  contrast 
with  the  preceding  comparisor,  ^  only  two  of  the  seven  installations  were 
found  to  depart  significantly  from  the  distribution  expected  on  the  basis 
of  proportionate  random  sampling.    These  results  are  consistent  with  the 
hypothesis  that,  in  general.  Project  Officers  select  respondents  without 
regard  for  sex  or  educational  background. 

Taken  as  a  whole,  the  findings  of  the  representativeness  study  indi- 
cate that,  while  overall  AOSP  respondent  distribution  may  be  representa- 
tive of  the  MOS,  Installation  level  distributions  exhibit  biases  in  the 
selection  of  respondents.    If  Installation  level  results  were  ever  desired, 
these  biases  would  require  weighting  by  pay  grade  to  produce  accurate 
results « 


Tabid  3.    A  Co.pari8on  of  the  Actual  64C  Ra.pondent  Distribution  for  Skill  Level  Ota.  (E1-E4)  with  the  Expected  Dletrlbution. 

Sex,  Clvlliap  Education,  and  Inatallarloa  * 


Sex  and  Educa- 
tion Group 

Fort  A 
Actual 

Expd* 

Fort  B 
Actual^ 

Expd" 

Fort  C 
Actual 

Expd* 

Fort  D 
Actual 

Expd* 

Fort  E 
Actual 

Expd* 

Fort  F 
Actual 

Expd* 

Port  G 
Actual 

Expd* 

Ail  Inecallatione 

Halea 

High  School  Grade 
Ron-HS  Grade 

FeMlea 

38 
14 

0 

36.32 
12.88 

2.80 

119 
16 

7 

105.70 
28.55 

7.75 

38 
8 

24 

43.83 
13.«>i 

i2.23 

139 
58 

13 

132.56 
65.83 

11.61 
• 

78 
33 

16 

78.83 
32.73 

15.44 

51 
20 

U 

48.15 
24.40 

9.45 

48 
35 

2 

49.60 
32.84 

2.56 

5U 
184 

73 

498.14 
206.47 

63.39 

Total,  All  Groupe 
Chi  e'quare^ 

52 
2.98 

52. f^^ 

142 
7.26 

142.00 

70 

14.63 

70.00 

210 
1.41 

210.00 

127 
0.03 

127.00 

02 

1.22 

82.00 

85 

0.32 

85.00 

768 
4.23 

768.00 

Expected  frequency  baaed  on  proportionate  sampling  of  the  inatallation  64C  Skill  Level  One  population, 
^    Excludes  19  anonyvoua  respondents. 

<^   With  2  degrees  of  frecdo.,  p,9  -  4.61,  p,95  -  5.99,  and  p,,,  -  9.21. 


&  1 


ERIC 


Flans  are  being  formulated  to  extend  these  analyses  to  MOS  to  be  sur- 
veyed during  1979  and  to  Incorporate  some  of  these  quality  control  measures 
into  the  survey  program.    By  studying  Installation  sampling  patterns  over 
several  surveys,  it  should  be  possible  to  determine  vhere  corrective 
measures  such  as  providing  feedback  and/or  additional  guidance  to  project 
officers  should  be  applied. 

The  representativeness  study  was  able  to  disclose  patterns  within  the 
64c  respondent  returns  which  were  not  apparent  in  the  day-to-day  operation 
of  the  AOSP.    It  is  anticipated  that  the  answer  booklets  and  questionnaire 
length  studies  will  similarly  highlight  aspects  of  AOSP  methodology  which 
might  have  gone  unnoticed,  or  poorly  noticed,  without  special  effort  at 

r 

evaluation.    An  Important  result  of  the  study  has  been  the  decision  to 
incorporate  some  of  these  quality  control  measures  into  the  survey  pro- 
cedures as  a  continual  (rather  than  one-shot)  methods  evaluation.  With 
each  survey  completed,  on-going  survey  programs  receive  many  opportunities 
to  learn  hcr^  to  Improve  themselves.    Statistical  self -evaluations,  auch  as 
those  outlined  in  this  paper,  can  be  a  valuable  tool  in  taking  advantage 
of  those  opportunities  to  learn  systematically  from  experience. 
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THE  USE  OF  JOB  SATISFACTION  DATA  IN  THE  OCCUPATIONAL  SURVEY  PROGRAM 


John  X,  Olivo,  Captain,  USAF 
and 

Elena  J,  Weber,  Captain,  USAF 

USAF  Occupational  Measurement  Center 
Occupational  Survey  Branch 
Lackland  AFB  TX  78236 

Each  year  the  USAF  Occupational  Measurement  Center  conducts  occupa- 
tional analyses  of  51  USAF  airmen  career  ladders.    The  career  ladders 
analyzed  during  any  calender  year  viiry  from  flight  engineer  to  still 
photographer  to  dental  technician.     Vhe  data  from  these  various  career 
ladders  are  collected  using  a  survey  instrument  which  is      Tided  into 
three  parts:     1)  specific  biographical  information  about  the  survey 
respondent;  2)  questions  concerning  the  individual's  job;  and  3)  a 
detailed  listing  of  tasks.    This  paper  will  deal  with  the  job  satis- 
faction data  collected  in  part  two  of  the  survey  instrument.    The  four 
indices  used  to  collect  the  job  satisfaction  data  will  be  discussed 
first,  followed  by  a  brief  review  of  the  procedures  used  to  compile  the 
1977  data.    Next,  uses  of  the  data  and  trends  noted  from  the  1977  data 
will  be  discussed.    Finally,  some  applications  of  the  data  both  within 
occupational  surveys  and  also  in  training  and  management  areas  will  be 
reviewed. 

Four  indices  are  used  in  a  USAF  job  inventory  to  collect  data 
concerning  job  satisfaction.    The  first  is  perceived  job  interest.  Here 
the  respondent  is  asked  to  rate  how  interesting  he  or  she  perceives  his 
or  he*"  job  on  a  seven  point  scale  ranging  from  Extremely  Dull  to  Extremely 
Interesting.    The  next  two  indices  are  perceived  utilization  of  talents 
and  training.    A  seven  point  scale  which  ranges  fro>ti  Not  At  All  to 
Perfectly  is  used  for  these  two  indices.    The  final  ;.ndex  of  job  satis- 
faction on  the  inventory  is  reenlistment  intentions «    Here  the  respondent 
is  asked  if  he  .vt  she  plans  to  reenlist.    A  four  point  scale  ranging 
from  No  to  Uncertain  to  Yes  is  used  for  this  que(?.tioa. 

This  is  the  third  year  in  which  job  satisfactii:^a  data  has  been 
compiled  and  used  for  comparison  purposes  with  on-going  surveys.  Each 
year  the  format  used  to  report  the  data  has  been  changed.    The  1975  data 
on  survey  respondents  were  combined  with  no  divisions  by  time-in-service 
or  career  area  group.    The  1976  survey  data  were  separated  into  two 
time-in-service  groups,  1  to  48  months  total  active  federal  military 
service  (TAFMS)  and  49  plus  months  TAFMS.    However,  the  1977  data  were 
sorted  both  by  tlme-in-service  and  by  career  area  groups.    The  three 
time-in-service  groups  used  in  1977  summ.iry  statistics  were  1-48  months 
TAFMS,  49-96  months  TAFMS,  and  97+  months  TAITIS.    This  appeared  to  give 
the  user  sufficient  distinction  between  the  various  time-in-service 
groups . 


The  problem  of  grouping  the  various  Air  Force  specialties  into 
career  area  groups  was  more  difficult  to  resolve.    An  authoritative 
source  document  on  which  to  base  the  groupings  was  necessary.    It  was 
decided  to  use  AFM  26-3,  Air  Force  Manpower  Standards,  (Vols  II-V)  as  a 
basis  for  grouping  the  various  career  fields.    The  67  enlisted  specialties 
used  for  the  1977  sunmary  were  divided  into  seven  groups.    These  were: 
Aircrew;  Mission  Equipment  Operations;  Mission  Equipment  Maintenance; 
Coiranand  Support;  Medical;  and  Special  Duty  Identifiers.    The  list  of  the 
various  Air  Force  Specialties  comprising  each  of  the  seven  groups  is 
attached  at  the  end  of  tliis  paper. 

The  data  are  presented  in  a  series  of  tables.    Tables  1-3  present 
composite  pictures  of  each  cf  the  throe  TAFMS  groups  by  career  area. 
This  allows  for  easy  identification  of  differences  in  each  of  the  four 
job  satisfaction  indices  from  career  area  to  career  area  for  each  of  the 
three  time-in-service  groups. 

The  job  satisfaction  data  presented  -in  these  tables  has  routinely 
been  included  as  part  of  the  occupational  survey  report  (OSR).  Although 
analyses  of  the  data  or  plausible  explanations  for  the  data  are  not  part 
of  the  report.    The  aata  are  also  presented  for  each  of  the  job  groups 
identified  within  the  career  ladder  or  ct^reer  field  being  surveyed  for 
time-in-service  groups.  Results  from  a  particular  field  are  then  compared 
to  the  USAF  average  for  the  previous  year  to  see  if  any  large  deviations 
exist.    Large  variations  are  highlighted  in  occupational  survey  reports. 

In  previous  years  the  data  had  been  arranged  so  that  little  direct 
comparison  could  be  made.    Having  arranged  the  1977  job  satisfaction 
data  to  reflect  time-in-sarvice  and  career  area  groups  has  allowed  more 
direct  comparisons  be  made  between  current  and  previous  suin^eys.  For 
example,  personnel  with  49  to  96  months  TAFMS  in  the  administration 
career  ladder,  a  specialty  in  the  direct  support  career  area,  can  be 
compared  directly  to  other  personnel  with  the  same  time-in-service  from 
the  direct  support  career  area  surveyed  the  previous  year. 

Several  interesting  trends  were  noted  within  the  1977  data.    It  had 
been  assumed  that  when  the  data  were  organized  by  career  area  groups 
there:  would  be  some  variance  it   each  o,<:  the  indicies  from  career  area  to 
career  area.    The  assumption  had  been  that  clerical  administrative 
personnel  would  not  find  their  job  as  satisfying  as  would  the  dental 
technicians.    The  data,  however,  showed  that  across  the  career  area 
groups  tha  level  of  job  satisfaction,  perceived  utilization  of  talents 
and  training,  and  reenlistment  intentions  were  fairly  consistent.  The 
major  differences  that  occurred  were  between  time-in-service  groups,  not 
career  field  groups.    There  typically  was  a  slight  (less  than  five 
percent)  increase  in  job  satisfaction  from  the  1  to  48  months  TAFMS 
respondents  to  the  49  to  96  months  TAFMS  respondents.    However,  the 
increase  between  the  49  to  96  months  TAFMS  respondents  and  those  with 
97+  months  TAFMS  x^ae  fairly  large,  generally  about  ten  percent.  Again, 
the.  implications  of  these  differpnc^.s  cr2  not  discussed  in  the  OSR. 
Force  managers,  however,  might  and  do  ii^nd  such  data  invaluable,  and  the 
Occupational  Measurement  Center  i&    I.   /«  ready  to  assist  in  interpreting 
and  using  these  data.  4-. 


There  also  appeared  to  be  little  connection  between  reenlistment 
intentions  and  the  other  three  job  satisfaction  indicies.    For  survey 
respondents  with  1  to  48  months  TAFMS  approximately  three-fourths  of  the 
respondents  in  each  career  area  group  found  their  job  interesting  and 
felt  their  talents  and  training  were  being  used  fairly  well  or  better; 
yet,  less  than  half  (46  percent)  planned  to  reenlist.    A  good  example 
were  operating  room  personnel  (AFS  902X2).    While  80  percent  or  more  of 
the  first  enlistment  personnel  found  their  job  interesting  and  felt 
their  talents  and  training  were  being  used  fairly  well  or  better,  only 
35  percent  planned  to  reenlist.    This  trend  continued  with  the  second 
term  groups.    Only  among  personnel  with  97+  months  TAFMS  were  the  responses 
to  the  four  indicies  fairly  consistent. 

Another  trend  noted  was  that  across  all  career  area  groups  the 
level  of  "job  satisfaction  was  fairly  consistent  except  for  aircrew 
personnel.    The  level  of  job  satisfaction  among  these  personnel  in  each 
of  the  three  time-in-service  groups  was  well  above  that:  reported  by 
incumbents  in  any  other  career  area  group.    Unlike  other  career  area 
groups,  however,  the  aircrew  personnel  showed  little,  if  any,  increase 
in  job  satisfaction    from  one  time-in-service  group  to  the  next.  The 
only  index  that  did  increase  markedly  was  the  reenlistment  intention. 

Currently  there  are  several  agencies  which  use  the  job  satisfaction 
data  collected  in  occupational  surveys.    The  Air  Force  Human  Resources 
Laboratory  at  Brooks  AFB,  TX  has  continually  used  this  data  for  a  nunber 
of  research  projects.    Headquarters  Air  Training  Command  at  Randolph 
AFB,  TX  is  attempting  to  develop  some  correlation  between  job  satisfaction 
data  and  reenlistment  rates  to  determine  training  effectiveness.  Within 
the  occupational  survey  program  this  data  is  primarily  collected  and 
reported  for  each  individual  specialty  being  surveyed.  Occupational 
analysts  sometimes  find  job  groups  within  specialties  which  have  consis- 
tently different  r^Bitings  on  the  job  satisfaction  indices  than  other 
career  ladder  job  groups.    This  might  serve  as  another  indicator  for 
indentifying  job  type  groups.    In  addition,  analysts  also  report  differ- 
ences for  particular  specialty  when  compared  to  the  other  specialties 
within  that  career  aref^  group. 

The  job  sati^^i  action  data  offers  several  areas  for  further  research. 
One  area  would  be  ;o  compare  job  satisfaction  data  among  each  year  group 
within  the  1-48  TAFMS  group.    Along  this  same  line,  personnel  with  192 
to  240  months  TAFMS  (the  16  to  20  year  group)  could  be  grouped  individually 
and  :hen  compared  to  personnel  with  97  to  191  months  TAFMS.    A  second 
are^:.  of  consideration  would  be  a  statistical  analysis  to  determine 
whether  in  fact  there  are  significant,  differences  in  job  satisfaction 
data  among  the  various  career  areas.    Also,  the  relationship  between 
Airmen  Qualification  Examination  (AQE)  scores  and  job  satisfaction  data 
should  be  further  explored;  if  a  rela^:l^>nship  does  exist,  it  would 
provide  another  piece  of  information  r>'at  would  help  understand  the 
complex  work  motivation  issue. 
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Summary 


The  job  satisfaction  data  collected  from  surveys  conducted  In  1977 
were  reported  for  tlme-ln-servlce  and  career  area  groups.    These  data 
are  routinely  reported  as  part  of  the  occupational  survey  report.  While 
no  detailed  examination  of  the  data  Is  made,  large  deviations  from  other 
groups  within  the  study  or  from  the  averages  o£  the  previous  year  are 
reported.    These  large  deviations  can  sometimes  be  an  aid  In  job  typing. 
One  consistent  result  Is  a  low  relationship  between  reenllstment  Inten- 
tions and  the  other  three  job  satlsf ruction  Indlcles.    In  addition  to 
OMC,  the  job  satisfaction  data  is  used  by  HQ/ATC,  AFHRL,  and  force 
managers  at  AFMPC  and  the  Air  Staff.    Finally  this  data  provides  areas 
for  future  research  into  such  Issues  as  changing  patterns  in  job  satisfac- 
tion among  yeflt  groups  in  the  first  four  years  of  an  air  force  career, 
determining  the  level  of  significance  in  job  satisfaction  among  the 
various  career  areas,  and  the  relationship  between  AQE  scores  and  job 
satisfaction. 
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TABLE  1 


EXPRESSION  OF  JOB  INTEREST,  PERCEIVED  UTILIZATION  OF  TALENTS  AND  TRAINING  AND  REEK^-STMENT  INTENTIONS 

BY  PERSONNEL  WITH  1-48  MONTHS  TAFMS  SURVEYED  DURING  1977* 


MISSION  MISSION 

TOTAL  EQUIPMENT  EQUIPMENT      COMMAND  DIRECT 

SAMPLE  AIRCREW  OPERATIONS  MAINTENANCE   SUPPORT   SUPPORT  MEDICAL 

I  FIND  MY  JOB: 


DULL 

16 

3 

25 

17 

12 

14 

15 

SO-SO 

19 

6 

25 

21 

15 

14 

15 

INTERESTING 

65 

91 

50 

62 

73 

72 

70 

MY  JOB  UTILIZES  MY  TALENTS: 


NOT  AT  ALL  OR  VERY  LITTLE 

31 

14 

44 

32 

25 

28 

30 

FAIRLY  WELL  TO  VERY  WELL 

63 

76 

53 

64 

64 

63 

62 

EXCELLENTLY  OR  PERFECTLY 

6 

10 

3 

4 

11 

9 

8 

MY  JOB    :iLIZES  MY  TRAINING: 


NOT  AT  ALL  OR  VERY  LimE 

26 

14 

26 

26 

20 

25 

17 

FAIRLY  WELL  TO  VERY  WELL 

66 

64 

67 

67 

67 

64 

69 

EXCELLESTLY  OR  PERFECTLY 

8 

22 

7 

7 

13 

11 

14 

DO  YOU  PLAN  TO  REENLIST: 


61  57         58  62 

39  43         42  48 

...  9G 

*  TO  OBTAIN  A  REPRESENTATIVE  SAMPLE,  THE  COMMAND  SUPPORT  AND  ^lEDICAL  AREAS  CONTAIN  RESPONSES  COLLECTED 
DURING  1976  AND  1977 


NO  OR  PROBABLY  NO  59        44  51 

YES  OR  PROBABLY  YES  41        56  49 


TABLE  2 

EXPRESSION  OF  JOB  INTEREST,  PERCEIVED  OlILIZATION  OF  TALENTS  AND  TRAINING  AND  RERNLISTMENi:  MENTIONS 
BY  PERSONNEL  W^TH  ^9-96  MONTHS  TAFMS  SURVSyED  DIING  1577* 


MISSION  MISSION 

TOT.^  EQUIPMENT  EQUIPMENT    COMMAND  DIRECT 

SAMPLE  AIRCREW  OPERATIONS  MAINTENANCE  SUPPORT  SUPPORT  MEDICAL 

I  FIND  MJ  JOB: 


DULL 

13 

3 

27 

12 

11 

16 

14 

SO-SO 

16 

8 

19 

16 

15 

16 

11 

INTERESTING 

71 

89 

54 

72 

74 

68 

75 

Iff  JOB  UTILIZES  MY  TALENTS: 

M'  AT  AU  OR  VERY  LITTLE 

23 

U 

38 

21 

19 

28 

23 

2  FAIRLY  ML  TO  VERY  WELL 

68 

70 

57 

71 

70 

62 

66 

EXCELLENTLY  OR  PERFECTLY 

9 

16 

5 

8 

11 

10 

11 

MY  JOB  UTILIZES  MY  TRAINING: 

NOT  AT  ALL  OR  VERY  LITTLE 

24 

11 

28 

22 

18 

28 

18 

FAIRLY  WELL  TO  VERY  WELL 

66 

63 

64 

68 

71 

63 

67 

EXCELLENTLY  OR  PERFECTLY 

10 

26 

8 

10 

11 

9 

15 

DO  YOU  PLAN  TO  REENLIST: 

NO  OR  PROBABLY  NO 

35 

24 

25 

35 

39 

34 

32 

YES  OR  PROBABLY  YES 

65 

76 

75 

65 

61 

66 

68 

I 
I 

i  *  TO  OBTAIN  A  REPRESENTATIVE  SAMPLE,  M  mm  SUPPORT  AND  MEDICAL  AREAS  CONTAIN  RESPONSES  COLLECTED 
;     DURING  1976  AND  1977 


t 

I 

TABLE  3 

:         EXPRESSION  OF  JOB  INTEREST,  PERCEIVED  OTILIZATION  OF  TALENTS  AND  TRAINING  AND  REENLMT  INTENTIONS 

BY  PERSONNEL  WITH  91+  MONTHS  lAFMS  SURVEYED  DURING  1977* 


MISSION  MISSION 
*'OT/iL         EOUTPMm  mmm 


I  FIND  MI  JOB: 

^  AIRCREW  OPERATIONS  MAINIEMANCE 

SUPPORT  iliiPPORT  MEDICAL 

DULL 
SO-SO 

INTEESTING 

9 

10 
81 

4 

7 

89  - 

14 

,  13 
73 

9 

11 
80 

10 
8 

10 
10 

ou 

8 
9 

MY  JOB  UTILIZES  MY  lALSJiTS: 

^    NOT  AT  ALL  OR  VERY  LITTLE 
N    FAIRLY  WaL  TO  VERY  ZlL 
EXCEENTLY  OR  PERFECTLl' 

15 
55 

I) 

8 

o5 
27 

23 
13 

14 
18 

16 

Jl 

il 

17 
21 

12 
6o 
22 

MY  JOB  UTILIZES  MY  TRAINING: 

NOT  AT  ALL  OR  VERY  LITTLE 
FAIRLY  WELL  TO  VERY  WELL 
EXCEIIENILY  OR  PERI^CTLY 

IS 
61 
20 

8 

62 
30 

25 
60 
15 

18 
63 
19 

18 
57 
25 

^2 
60 
18 

12 
63 
25 

DO  YOU  PIAN  TO  REENLIST: 

NO  OR  PROBABLY  NO 
YES  OR  PROBABLY  YES 

27 
73 

20 
80 

31 
69 

28 

n 

27 
73 

27 
73 

23 
77 

JUfiiNb  Ij/o  AND  19// 
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LISTING  OF  MAJOR  GROUPING  AFSs 


AIRCREW 

1.  111X0  Defense  Aerial  Gutmer 

2.  112X0  In-Flight  Refueling  Operator 

3.  113X0  A/C  Flight  Engineer 

4.  114X0  Aircraft  Loadaaster 

5.  115X0  Fararescue  Recovery 


MISSION  EQUI?HENT  OPERATIONS 

1.  20XXX  Intelligence 

2.  27XXX  Command  Control  Systeris  Operations 

3.  29XXX  Communicatiooa  Operations 


MISSION  EQUIPMENT  MAINTENANCE 


1.  30XXX 

2.  31XXX 

3.  32XXX 

4.  34XXX 

5.  36XXX 

6.  40XXX 

7.  42XXX 

8.  43XXX 

9.  44XXX 
10.  46XXX 


Communicatiooa  Electronics  Systems 
Missile  Electronic  Maintenance 
Avionics  Systems 
Training  Devices 

Wire  Communications  Systems  Maintenance 

Intricate  Equipment  Maintenance 

Aircraft  Systems  Maintenance 

Aircraft  Maintenance 

Missile  Maintenance 

Munitions  and  Weapons  Maintenance 


COMMAND  SUPPORT 


1.  lOXXX 

2.  24XXX 

3.  65XXX 

4.  66XXX 
3.  67XXX 

6.  69XXX 

7.  70XXX 

8.  71XXX 

9.  73XXX 

10.  74XXX 

11.  79XXX 

12.  87XXX 


First  Sergeant 

Safety 

Procurement 

Logistics  Plates 

Accounting  and  Finance 

Management  Analysis 

Administration 

Printing 
Personnel 

Morale,  Welfare,  and  Recreation 

Inf  oirmation 

Band 


qr 
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LISTING  OF  MAJOR  GROUPING  AFSCs  (CONT) 


DIRECT  SUPPORT 


1.  22XXX 

2.  23XXX 

3.  25XXX 

4.  39XXX 

5.  47XXX 

6.  51XXX 

7.  54XXX 

8.  55XXX 

9.  56XXX 

10.  57XXX 

11.  59XXX 

12.  60XXX 

13.  61XXX 

14.  62XXX 

15.  63XXX 

16.  64XXX 

17.  75XXX 

18.  81XXX 

19.  82XXX 

20.  92XXX 


Photomapplng 

Audiovisual 

Weather 

Maintenance  Management  Systems 

Vehicle  Maintenance 

Computer  Systems 

Mechanical/Electrical 

Structural/Pavements 

Sanitation 

Fire  Protection 

Marine 

Transportation 
Supply  Services 
Food  Services 
Fuels 
Supply 

Education  and  Training 
Security  Police 

Office  of  Special  Investigations  and 

Counterintelligence 
Aircrew  Protection 


MEDICAL 


1.  90XXX 

2.  91XXX 

3.  98XXX 


Medical 
Medical 
Dental 


SPECIAL  DUTY  IDENTIFIERS  (SDIs) 


1.  99500 

2.  99501 

3.  99502 

4.  99503 

5.  99504 

6.  99505 

7.  99506 

8.  99508 

9.  99509 

10.  99600 

11.  99601 

12.  99602 

13.  99603 

14.  99604 


Recruiter 

Engineering  or  Scientific  Assistant 
Military  Training  Instructor 
United  States  Air  Force  Honor  Guard 
LGM-30  Facility  Manager 
Courier 

Combat  Information  jMsnitor 

Scatter  Communications  Maintenance  Technician 

Data  Formatting  Equipment  Operator 

Student  Training  Advisor 

ICBM  Maintenance  Manager 

Sensor  Operator 

Minuteman  NCO  Code  Controller 

Postal  Specialist 
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F  THE  ARMY  3)LICY  OR  DECI3I0N*  UNLESS 
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Gbveral  0v4b;RV1Em  and  iMrnAL  F WINES  OF  THE  Project  on  Job 
Satisfaction  amd  RETaniitN  of  0.2.  Army  Enlisted  Pb^sonnel 

I.    OVERVrB*  OF  THE  JCB  SATISFACTION  AND  RETENTION  PROJECT 

Thc  us  Ihmy  MiLiTASr  Personicl  Center's  (MILPEREEN)  job 

SATIS?SACTIOi  AND)  RETENTICDN  PROJECT  WAS  DESCRIBED  AT  THE  IQtH 
ANNUAL  CONF£llENCE  OF  TK  HlLITAR^  TESTING  ASSOCIATION  HELD  IN 

Saw  AirrowiQ*  Texas  in  Ocidber  1977.  The  primary  intent  of 

today's  PSESENTATrON  IS  TD  UPDATE  THE  STATUS  OF  THIS  PROJECT 

rnxFim  mi  pa^t  year  as  mtt  as  tq  recapitulate  its  scope  and. 

<WHKTi«MAS  BEQDI  ACH ! EVED iSP  TO  NOW.    ImS  OVERVIEW  WILL  CONSIST 
Or  im  ^LLOWINS:     (1)  TME  ENV1R06WEIT  IN  WHICH  WE  PRilUECT 

ms  ^miTihrEDj  (2)  project  phases;  ^akd  (3)  the  Intewbed  uses 
A.  Context  qif  t=4p  Pterf^^rT. 

SrNCE  1968,  HILPERCBI  throhgj*  ns  Army  OccuPATiiMiAL  Survey^P 
Prqhsih  (AQ5P)  HAS  systematically  amnjcTED  occupatiubhrl  analysis 

OF  EMLISTED  fffLlTARY  OdDUPATIONAL  SiPECl/WT I ES  (MOS)  .     In  THE 
FALL  OF  197^/  A  vASB  SAiTISFACTION  SEmOW  WAS  ADDED  TG:  EACH  OF 
ITS  A>TY  MOS  SURVEY  QUSOONNAIRES .    ThIS  SECTION  C0M5ISTED  OF 
MTNETIEN  MfimSURE$  USED  TO  OPERATIONALLY  IDEFINE  AND  EPfPIRICALLY 
MEASUHE  fATfSPACtlON  WT^  ONE's  ArMY  JOB  AND  WITH  MIUTARY  LIFE. 
The  TEFim4T10NS  USED  ESSBITIALLY  COMPRISED  THE  HYGIBK  FACTORS 
(intrinsic  TC  Omt'S  job)  and  THE  MOTIVATOR  FACTORS  (EXTRINSIC 
TO  one's  JOB^  RELATIMfi  TO  ONE's  WORK  ENVIRONMENT)  THAT  FREDERICK 

Herzberc  iiamFEEp  m  his  research  on  JOB  satisfaction' (Herzberg^ 
Mausner,  aw  Jhybermau^  1959). 
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These  nineteen  factors^  as  shown  in  Table  L  provided  very 
incomplete  coverage  of  those  factors  which  could  potentially 
have  a  significant  influence  on  job  and  career  satisfaction, 

Moreover^  these  original  factors  did  not  pertain  directly  to 
reenlistment  intent.   consequently^  the  job  satisfaction 
portion  was  expanded  to  more  thoroughly  examine  the  relationship 
of  job  satisfaction  (work  attitudes)  to  the  retention  (decision 
to  stay  or  leave  the  service)^  unit  morale  and  duty  performance 

OF  ENLISTED  PERSONNEL.  ThIS  WAS  BASED  PRIMARILY  ON  RESEARCH 
CONDUCTED  AT  THE  AlR  FoRCE  HUMAN  RESOURCES  LABORATORY  (AlLEY 

AND  GouLD^  1975).  Interest  centered  on  the  relationship  between 
job  and  career  satisfaction  and  first-term  reenlistkents . 

This  expansion^  constituting  the  initial  phase  of  this 
project^  was  part  of  the  Army's  overall  effort  to  ga3n  additional 
insights  into  retention^  job  satisfaction  and  the  all-volunteer 
Army.   The  primary  goal  is  to  improve  the  Army's  ability  to 
recruit  and  retain  an  adequate  number  of  quality  soldiers. 

As  Tuttle  and  Hazel  have  noted^  the  majority  of  the  research 
and  applications  concerning  job  satisfaction  have  occurred  in 
industry  (Tuttle  and  Hazel^  197^).  Within  the  past  ten  years^ 
however^  the  military  has  begun  to  apply  research  findings  from 
the  private  sector  and  to  sponsor  its  own  research  in  this  area. 
MILPERCEN's  efforts  in  this  area  related  to  the  quality  of  life 

RESEARCH  CONDUCTED  BY  THE  AlR  FoRCE  HUMAN  RESOURCES  LABORATORY 
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AND  THE  Naval  Personnel  Research  Laboratory.  As  ?^  •ously 

INDICATED^  this  JOB  AND  CAREER  SATISFACTION  PROJEvf  r^T 

aos^l-Y  (^j§|MB|3p§..^SrAiR:-FeRee':s!j^eRfi«^Y(Ad^ 

"[^Sk-rED.  THIS  Jo:       ::^s^^r'^    '■^-i'-'i  '  ' 

CLZ^^^^m^L^  m  9f{^^S.rS  APPROACH  (ALU  "  AtfJ)  I-  -D. 

iUR  PROJECT^  WHICH  HAS  BEEN  PR06RAM^ED  TO  COKT  TNUE  ^mo 
FISC^  Y^,^^f^  ^N|il9T£D  or  THE  FOLi.0WINC  INTER- 
REUT|9,  P^^P^jCjKRp^  Af^V^VtlDE  AT¥lTUDfi^^  ^  1M  ''^ 

,   j;^.A{^)^^§gS(9^RA^  ASMY^KIDE  SAWPtBOSuft^^^if:  )a»ff5feMATELY 

^^OOuPSWiiS!  S9K9^9T?D  in  August  1976.  This  b^oi?j  was  confined 
TO  Fii?|y-j^j^Y§!8se^*M|^^  ANf^.PAv^(^B«DBS^it5  Awe^^  iNft^'^Soviiiii)' 

L^§,Tfft^ij!j-5^lj^||/)l|rp^cR^  tf^J^^htWPtfps 

i^l  AnA^YS  I <?F  RE?;PON§f  S  OF.^  ATPftDXIMATELy  IfCcECS^S 
ARIj!^-^ip|j^jy^^|gRiJAI?||(iJJ7?  !rOr;AN;j8§  ITEM  QUESTJ*^  '*-.RE,  ThIS 
EFFORT, 59Np^^fy^|§       All86VJ6TED'^VBR8t0N  0l*"^tV"  ^S^^Fbkfcfe's' 
Oc^5y^TJ/JjJ^L.;ftT7:fei^ftft;J'Ny6(JTORY.  ALffl0USfi^'M§fcl^  2B  fO  X^COUNt 
F0|  JJfjEEE^^^Tl^H^^ISi;  im  -AND        'AH^  ^^uT    ^  %s'tjtf S  ^ 

OF  TH^^,  STI^Y^  VftilCH  .WILU  Ai*SO  BE  DB^CRrBKD  m-T^^^S  PR^isj^NtAtlDN. 
AR|  TfP  jJ^J>^|^Hf%^lf(,  T^^  FIRST  CUARTER  <»^  V 132351  LVEAR  "1979; 
ThI^  AN^Y|I^,j^j?^^:  BO?nH  PI:|^BT1^F!^f  WMfei S^tofERS  AND  ' 
^°W-TO^  ^lALtlSl^JDAWi)  iilASTrSftTI®Wril«fiA%lPEC^  6'f  Wf  tiFE  " 


mat  msKj  (b)  the  best  predictors  of  job  satisfaction^  reen- 

Lt^mEST  INTENT^  Am  UNIT  MORALE;  (c)  THE  MOST  IMPORTANT  REASONS 
F»l->=ilLJSTMENT  AND  SffARATIONj  mSL  Cd)  THE  RELATIONSHIP  BETWEEN 
H5.  ^LJSTMENT  INTENT  AMD  REBILISSTfSr  DECISION. 

i3)  Analysis  of  oecupational  survey  d^ata  collected  from 
♦t^  'DOE  (Recruiter)  msD  MOS  79D  C£arsr  C:jmselor)  in  the  spring 
3F  WI,  This  project  relates  the  peircr  ptions  of  1100  recruiters 

5©  CAREER  COUNSELORS  TO  THCJISE  OF  F'  RST-TERM  SOLUiERS  ON 
•MATraS  ASSOCIATED  WITH  ENLISTMENT         SEWRATION.    ThIS  REPORT 
:S   ALSO  SCHEDULED  FOR  PUBLICATION  l%mE  RRST  QUARTER  OF  FISCAL 

•-^frig/g. 

m)  An  Army-wide  survey  conducihj  in  IIovember  1977  of 

APa»TOXIMATELY  11^000  FIRST-TERM  ANC  CAREER  FORCE  MEN  AND  WOMEN. 
THVi  362  ITEM  QUESTIONNAIRE^  REPRES^SNTING  THE  END  PRODUCT  OF 
Oy&i  ONE  YEAR  OF  DEVELOPMENTAL  WTRK/  ADDRESSES  THE  ISSUES  OF 
JOB  SATISFACTION^  REENLISTMENT  IlfTENT^  UNIT  MORALE  AND  RECRUITER 
ACOWACY.     It  ALSO  COVERS  THE  IMPORTANCE  OF  FACTORS  RELATED  TO 
ENLISTMENT^  SEPARATION  OR  RETIREMJENT^  AND  REENLISTMENT.  ANALYSIS 
HAS  COMMENCED  RECENTLY.     INITIAL  liESULTS  (COVERING  THE  IMPORTANCE 
TG  ENLISTMENT^  REENLISTMENT  AND  SEPARATION  OF  THE  FIRST-TERM 
force)  are  to  be  published  during  THIS  QUARTER.  SUBSEQUENT 
ANALYSES  WILL  BE  PUBLISHED  INCREMENTALLY  THROUGHOUT  FISCAL  YEAR 

1979. 

C.    Intended  IIsfs 

The  two  principal  uses  of  the  ojuginal  job  satisfaction 

SECTION  CONTAINED  IN  THE  ArMY  OcCUP»?TIONAL  SuRVEY  PrOCRAM 
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QUESTIONNAIRE*  BASED  ON  A  HEISBERG  -  BASED  /mmHCHj  WERE: 
(1)  TO  DETEHmUE  THE  DEGREE  OF  SATISFACTION/D1*SK31ISFACTION 
BOWEEN  AND  mmHU  DIFFERENT  OCCUPATIONAL  SPECJiiaiESj  PAR- 
TPdlLARLY  IF  TliESE  MOS  WERE  IDENTIFIED  AS  "PROMSirNOS  (DUE 
IC  FACTORS  SsaCSi  AS  A  LARGE  IMBALANCE  BETWEEN  AHCWORIZED  AND 
aSPBTATING  FORCE  ?!^ENGTHS^  IMBALANCE  BETWEEN  CflMS  AMD  OVERSAS 
AiOTHORIZATIOIIi:  A  ^RGE  NUMBER  OF  PERSONNEL  EXPHfc!a=SINe  DISSAT- 
ISFACTION WITFI  T^HR  JOB^  INTENDING  TO  SEPARATE        RETIRE^  Pm/ 
OR  SPENDING  A  M»AJffi?ITY  OF  THEIR  TIME  ON  NON-DUTV  1?ELATED  W0R1i)j 
\ND  (2)  TO  AMPLIFY  OTHER  DATA  COLLECTED  IN  THE  QUESTIONNAIRE^ 
•  NCLUSIVE  OF  DL-T*/tASK  INFORMATION  AND  SPECIAL  KNOWLEDGES  AND 
<fEQUIREMENTS. 

Results  oe    jned  from  this  expanded  job  satisfaction  and 
^tention  projei^t  are  intended  primarily  to  meet  the  needs  of 
«CEY  Army  DECLsriON  -  making  agencies  (e.g.^  the  Office  of  the 
Deputy  Chief  :q5=^  Staff  for  Personnel  -  Recruitment  and  Reenlistment 
Division^  and  the  Enlisted  Promotions  and  Separation  Branch  of 
the  Enlisted  Division)  as  well  as  those  of  career  counselors 
(reenlistment  NCOs  throughout  the  Army). 

It  was  also  intended  that  this  project  be  linked  to  related 
research  conducted  by  other  Army  agencies  and  other  services 
within  DOD.  * 

To  accomplish  these  objectives^  considerable  time  . 
was  devoted  to  assessing  the  nature  and  extent  of  other  completed 
studies  or  those  in  progress  within  dod  pertaining  to  job 
satisfaction  and  reenlistment. 
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The  outcome  of  this  assessient  was  the  following  list  of 
USES  for  the  data  analyzed  in  lais  project: 

Examination  of  relationshits  between  job  satisfaction  and: 

-  RETENTION  (PARTICULARLY  CF=  FIRST-TERM  PS?S0NNEL) 

-  UNIT  MORALE 

-  OCCUPATIONAL  MISMATCH 

-  EFFECTIVE  USE  OF  TRAINED  ASSETS 

-  SELECTED  STUDIES  (E.G.^  toWEN  IN  THE  ArMY) 

II.  The  August  1976  Army-Wide  Survey 
A.  Introduction. 

The  FIRST  PHASE  OF  THIS  JOB  SATISFACTION  AND  RETENTION 
PROJECT  CONSISTED  OF  ANALYSIS  OF  A  SURVEY  DISTRIBUTED  TO  A  RANDOM 
SAMPLE  OF  PERSONNEL  ArMY-WIDE  IN  AUGUST  1975.     ALTHOUGH  THIS 
QUESTIONNAIRE  CONTAINED  80  ITEMS^  ONLY  38  WBcE  ANALYZED,  INCLUDING 
7HE  17  INDEPENDENT  AND  TWO  DEPENDENT  FACTORS  (OVERALL  JOB  SAT- 
ISFACTION AND  REENLISTMENT  INTENT)  USED  IN  THE  JOB  SATISFACTION 
PORTION  OF  THE  AOSP  QUESTIONNAIRES  FOR  W!'i,H  DATA  HAVE  BEEN 
COLLECTED  SINCE  197^.    ThE  OTHER  19  FACTORS  IN  THE  QUESTIONNAIRE 
USED  IN  THIS  ANALYSIS  WERE  THOSE  INSERTED  BY  OTHER  ArMY  AGENCIES 
FOR  THEIR  OWN  SPECIFIC  PURPOSES.     It  SHOULD  BE  NOTED  THAT  ALL 
80  ITEMS  WERE  CAST  IN  FINAL  FORM  PRIOR  TO  THE  INITIATION  OF 

THIS  PROJECT.   Since  this  questionnaire  was  a  composite  repre- 
senting THE  NEEDS  OF  DIFFERENT  AGENCIES^  IT  WAS  THEREFORE  NOT 
DESIGNED  TO  BE  A  "COMPREHENSIVE"  INSTRUMENT  FOR  MEASURING  THE 
PRIMARY  FACTORS  INFLUENCING  THESE  TWO  CRITERION  MEASURES.  As 
PREVIOUSLY  STATED/  COVERAGE  OF  FACTORS  WITH  THE  POTENTIAL  OF 
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MEASISiaC  REENLISTMENT  INTENT  WAS  MINIMAL  THROUGH  USE  OF  THE 

19  mams  used  in  the  AOSP,  With  the  addition  of  these  19 

OTHES  fSCTORS^  COVERAGE  OF  FACTORS  THAT  COULD  MEASURE  REENLISTMENT 
BEHA>M!K  WAS  SUBSTANTIALLY  IMPROVED  BUT  NOT  COMPLETE,  In 

suBSEaniBr  surveys  (e.g,^  the  February  1977  Army-wide  Survey 

WHICH  WILL  BE  DISCUSSED  LATER  IN  THIS  PRESENTATION),  THE  MAJOR 
DEFECTS  IN  THE  COVERAGE  OF  REENLISTMENT  RELATED  FACTORS,  AND 
TO  /A  LESSER  EXTENT  IN  THE  COVERAGE  OF  JOB  SATISFACTION  RELATED 
MEASURES,  HAVE  BEEN  REDUCED  CONSIDERABLY.  ThE 
ANALYSIS  OF  THE  AUGUST  1976  SURVEY  WAS  BASED  ON  3,679  PERSONNEL 
IN  VAYGRADES  E-3  AND  E"^  IN  THEIR  INITIAL  TERM  OF  ENLISTMENT. 

B.    Significant  Findings  and  CoNCLusinNs 
1.   The  factor  "My  work  is  interesting",  one  of  the  17 
original  independent  factors  in  the  aosp  measured  on  a  five 

POINT  SCALE  RANGING  FROM  "NONE  OF  THE  TIME"  TO  "ALL  OF  THE  TIME", 
EMERGED  AS  THE  BEST  PREDICTOR  OF  BOTH  REENLISTMENT  INTENT  AND 
JOB  SATISFACTION.    ThIS  FINDING  WAS  NOTED  FOR  E-3'S  AND  E-^'s 
SEPARATELY,    MALES  AND  FEMALES,  NON-HIGH  SCHOOL  GRADUATES, 
HIGH  SCHOOL  GRADUATES,  WHITES  AND  BLACKS,  AND  SINGLE  AND  MARRIED 
PERSONNEL,     It  IS  NOTED  THAT  THIS  FACTOR  (INTRINSIC  TO  ONE's  JOB) 
APPEARED  TO  EXERT  MUCH  MORE  INFLUENCE  ON  REENLISTMENT  INTENT  AS 
WELL  AS  JOB  SATISFACTION  THAN  FACTORS  PERTAINING  TO  ONE'S 
CAREER,  PARTICULARLY  MONETARY-RELATED  FACTORS  COMPRISING  MILITARY 
PAY,  ALLOWANCES,  AND  BENEFITS.     In  VIEW  OF  THE  NEED  OF  THE  ArMY 
TO  REDUCE  PERSONNEL-RELATED  COSTS  WHILE  INCREASING  THE  RETENTION 
RATE  OF  QUALIFIED  PERSONNEL,  ESPECIALLY  UNDER  THE  AlL  VOLUNTEER 
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force^  making  jobs  more  attractive  could  be  extremely  desirable. 
It  was  also  determined  that  the  expressed  reenlistment  intent 
OF  first-term  personnel  was  highly  correlated  with  actual  reen- 
listment DECISION.   This  was  especially  true  as  they  approached 
the  decision  point  regarding  reenlistment.   Similar  studies 

CONDUCTED  BY  THE  U.S.  NaVY  AND  THE  U.S.  AlR  FoRCE  ON  FIRST-TERM 
personnel  have  also  SHOWN  VERY  HIGH  CORRELATIONS. 

2.  Regular  Military  Compensation  (the  sum  of  basic  pay^ 
quarters  and  subsistence  allowances  or  equivalent^  and  federal 

INCOME  TAX  ADVANTAGE  COMPARED  TO  SALARY/WAGES  MADE  IN  CIVILIAN 
LIFE)^  NOT  ONE  OF  THE  ORIGINAL  FACTORS  USED  IN  THE  AOSP^  WAS 
GENERALLY  A  CONSISTENT  PREDICTOR  OF  REENLISTMENT  INTENT  BUT  TO 
A  LESSER  EXTENT  THAN  WORK  INTEREST.    ThIS  WAS  TRUE  REGARDLESS 
OF  THE  soldier's  SEX  OR  RACE.     ThIS  ALSO  APPLIED  TO  E-^'s^  SINGLE 

personnel^  and  high  school  degree  graduates^  but  not  their 
complements. 

3.  Work  importance^  work  challenge^  and  working  association 

WITH  one's  SUPERVISORS  WERE  RELATIVELY  CONSISTENT  PREDICTORS 

of  job  satisfaction  in  terms  of  grade^  sex^  educational  level^ 
race^  and  marital  status. 

^.   Soldiers  who  felt  they  were  given  accurate  information 

BY  THEIR  Army  RECR4jr.TER  HAD  A  SIGNIFICANTLY  HIGHER  INTENTION  TO 
REENLIST  AND  HAD  SIGNIFICANTLY  GREATER  JOB  SATISFACTION  THAN  THOSE 

WHO  didn't.  The  belief  that  Army  recruiters  told  the  truth 
ABOUT  Army  life  does  not  necessarily  imply  that  they  either 
truly  represented  or  misreeresented  the  facts  about  Army  life. 
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What  this  indicated  was  the  extent  to  which  the  expectations 
of  the  individual  correspond  to  the  information  imparted  to 

him/her  by  THE  ArMY  RECRUITER.    ThOSE  INDIVIDUALS  MORE  LIKELY 

TO  ACCEPT  Army  life  for  what  it  really  ts^  regardless  of  the 

INFORMATION  BY  THE  RECRUITER^  ARE  IN  TURN  MUCH  MORE  LIKELY  TO 
REENLIST  AND  TO  BE  SATISFIED  WITH  THEIR  JOB- 

III.  The  February  1977  Army-wide  Survey 
A.  Introduction. 

Just  as  for  the  April  1977  Pilot  Test,  because  of  time  and 
manpower  constraints  it  was  decided  to  utilize  in  part  the 

EMPIRICALLY  DEVELOPED  JOB  SATISFACTION  FACTORS  FROM  THE  AFHRL 
FOR  A  SURVEY  TO  CONSTITUTE  PhASE  II  OF  THE  OVERALL  PROJECT, 

Other  factors  were  added  based  on  the  previously  described 
August  1976  Army-wide  Survey  and  its  analysis.  Work  conducted 
BY  THE  us  Army  Research  Institute  for  the  Behavioral  and  Social 
Sciences  was  carefully  considered  for  application  to  the  project. 

A  REPORT  BY  N.W.  AyER.  InC.  ON  THE  ATTITUDES  AND  MOTIVATIONS 

of  first-termers  toward  reenlistment  and  a  study  done  by  the 
Office  of  the  Deputy  Chief  of  Staff  for  Personnel.  Department 
OF  THE  Army,  on  the  attitudes  of  soldiers  leaving  the  Army  were 

ALSO  RESEARCHED,     ThESE  EFFORTS  CULMINATED  IN  THE  DEVELOPMENT 
OF  AN  80  ITEM  QUESTIONNAIRE  ADMINISTERED  ArMY-WIDE  TO  A  RANDOM 
SAMPLE  OF  3708  SOLDIERS  IN  FEBRUARY  1977.    FORTY-TWO  OF  THESE 
ITEMS  PERTAINED  DIRECTLY  TO  AN  EVALUATION  OF  SATISFACTION  ON  A 
SEVEN  POINT  SCALE  RANGING  FROM  "EXTREMELY  DISSATISFIED"  TO 

"Extremely  Satisfied",   The  remaining  questions  provided 


back60und  information  and  addressed  areas  thought  to  influence 
job  satisfaction  or  reenlistment  intent  but  which  could  not  be 
effectively  measured  on  a  satisfaction  scale. 

Of  the  3^708  cases  on  which  the  analysis  was  based^  1^532 
comprised  the  first  term  sample  while  the  career  force  sample 

CONTAINED  2U76  INDIVIDUALS.    AlL  FIRST-TERM  SOLDIERS  WERE  IN 
PAYGRADE  E-5  OR  BELOW  AND  HAD  LESS  THAN  FOUR  YEARS  OF  ACTIVE 
FEDERAL  MILITARY  SERVICE.    AlL  THE  MEMBERS  OF  THE  CAREER  FORCE 
WERE  SERVING  A  SECOND  OR  SUBSEQUENT  ENLISTMENT;  WERE  IN  PAYGRADE 
E-3  AND  ABOVE;  AND  HAD  AT  LEAST  THREE  YEARS  OF  ACTIVE  FEDERAL 
MILITARY  SERVICE. 

B.    Significant  Findings  and  Conclusions 

(1)  Aspects  of  Army  life  viewed  as  the  most  and  least 

SATISFYING! 

In  GENERAL^  SOLDIERS  INDICATED  GREATEST  SATISFACTION 
WITH  FACTORS  InTRINSIC  TO  THEIR  WORK  AND  THE  GREATEST  DISSATIS- 
FACTION WITH  EXTRINSIC  OR  SITUATIONAL  FACTORS^  AS  INDICATED 

IN  Tables  2  and  3.   For  example^  first-termers  were  most 

SATISFIED  WITH  THE  SECURITY  PROVIDED  BY  THEIR  JOBS  WHILE  CAREERISTS 
WERE  MOST  SATISFIED  WITH  THE  OPPORTUNITY  TO  HELP  OTHERS  BY 
DOING  THEIR  JOB.     On  THE  OTHER  HAND^  BOTH  GROUPS  WERE  LEAST 
SATISFIED  WITH  THE  WAY  THE  ArMY  MAKES  USE  OF  ITS  ENLISTED  PERSONNEL. 

Examination  of  the  responses  from  first-term  subgroups  (e.g,^ 
mekiiiwomen^  high  school  degree  graduates^  non-high  school  degree 

graduates)  ALSO  REVEALED  SATISFACTION  WAS  LOWEST  WITH  REGARD 

TO  PERSONNEL  UTILIZATION.     ThIS  WIDESPREAD  SENSE  OF  MALUTI LIZATION 
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WOULD  ARGUE  STRONGLY  FOR  ADDITIONAL  SENSITIVITY  BY  THE  ArMY 

toward  effective  assignment  and  use  of  enlisted  personnel. 
Increased  efforts  to  provide  meaningful  work^  ensure  that 
training  is  a  reflection  of  job  requirements^  and  improve  the 
actual  match  between  primary  mos  and  work  performed  would  be 
most  beneficial. 

(2)  Predictors  of  Job  Satisfaction,  Reenlistment  Intent. 

AND  Unit  Morale 

(a)  Job  Satisfaction 

As  INDICATED  IN  TaBLE  l^j  SATISFACTION  WITH  WORK 

performed  in  jerms  of  interest^  importance,  challenge^  variety, 
and  the  use  of  training  and  abilities  was  the  primary  predictor 
of  job  satisfaction  for  both  first-termers  and  careerists. 
Satisfaction  with  their  supervisor's  leadership^  technical  and 
administrative  skills  was  also  important  to  both  groups.  for 
first-termers^  changes  in  the  work  performed  has  the  greatest 
potential  for  improving  the  attitudes  of  fir^:t-term  soldiers. 
This  conclusion^  however^  is  contingent  on  providing  work  related 
TO  one's  Primary  MOS  and  relevant  to  training  received. 

Compared  to  first-termers^  the  overall  job  satisfaction  of 
careerists  was  more  closely  associated  with  satisfaction  toward 

THEIR  WORK  SCHEDULES  (RELATING  TO  THE  LENGTH  OF  ONE'S  WORK  HOURS) ^ 
opportunities  FOR  WORKING  AND  ASSOCIATING  WITH  PEOPLE  THEY  LIKE^ 
AND  HAVING  RESPONSIBILITY  FOR  SEEING  A  JOB  THROUGH  TO  COMPLETION. 

(b)  Reenlistment  Intent 

Among  the  "best"  predictors  of  reenlistment  intent 


FOR  FIRST-TERMERS  AND  CAREERISTS^  AS  OBSERVED  IN  TaBLE  Sj  ONLY 
RELATIVE  SATISFACTION  WITH  PAY  AND  ALLOWANCES  EMERGED  FOR  BOTH 

GROUPS.   First-term  soldier-s  attitudes  toward  their  work^ 
THE  Army's  use  of  enlisted  personnel  (a  significant  predictor 
of  job  satisfaction^  as  previously  noted) ^  and  recruiter 
accuracy  were  also  important  contributors  to  reenlistment  plans, 
Pertaining  to  the  latter  aspect^  it  would  appear  that  an 

ACCURATE  AND  RELATIVELY  COMPLETE  PORTRAYAL  OF  ArMY  LIFE  AND  WORK 

by  the  recruiter  is  an  essential  ingredient  for  the  long  range 
retainability  of  first-termers  (also  found  in  the  august  1976 
Army-wide  survey  as  previously  discussed), 

Along  with  satisfaction  toward  pay  and  allowances^  the  factors 

IDENTIFIED  AS  THE  "BEST"  PREDICTORS  OF  REENLISTMENT  INTENT  FOR 
CAREER  SOLDIERS  WERE:    ArMY  POLICIES  AND  PROCEDURES  (E.G,^ 

promotion^  evaluation^  reenlistment^  discipline);  family 
recognition  and  pride  !n  the  soldier's  workj  and  duty  location, 
Among  these  factors^  dissatisfaction  was  expressed  only  with 

POLICIES  AND  procedures,     ThIS  SITUATION  DID  NOT  APPEAR  TO  BE 
A  PROBLEM  OF  COMMUNICATION  SINCE  CAREERISTS  WERE  BASICALLY 
SATISFIED  WITH  THE  AVAILABILITY  OF  INFORMATION  CONCERNING  FACETS 

OF  Army  life. 

(c)  Unit  Morale 

The  "best"  predictors  of  unit  morale  for  both 

FIRST-TERM  AND  CAREER  SOLDI ERS^.. AS  SHOWN  IN  TaBLE  6^  INCLUDED 
THE  LEVEL  OF  SATISFACTION  WITH:  PRIDE  THAT  CO-WORKERS  HAVE  IN 
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THE'^NIT  AND  ArMY;  UNIT  POLICIES  AND  PROCEDURES  (E.G.^  PROMOTION^ 

leave^  time-off>  evaluation);  and  training  given  at  unit  level. 
Satisfaction  with  the  quality  and  availability  of  both  on  and 

OFF-POST  EATING  FACILITIES^  AND  THE  ArMY'S  EMPHASIS  ON  EQUALITY 
OF  THE  SEXES  WERE  ALSO  IMPORTANT  TO  PREDICTING  THE  OPINION  OF 
FIRST-TERMERS  OF  UNIT  MORALE.     FOR  CAREERISTS^  RELATIVE  SATIS- 
FACTION WITH  THEIR  SUPERVISORS'  SKILLS  AND  THE  AVAILABILITY 
OF  NECESSARY  INFORMATION  CONCERNING  UNIT  POLICIES  AND  PROCEDURES 
ALSO  CONTRIBUTED  TO  PREDICTING  ATTITUDES  TOWARD  UNIT  MORALE. 

(3)  Enlistment  and  Separation  Reasons 
(a)  Enlistment 

Examination  of  the  initial  plans  of  first-term 

SOLDIERS  TOWARD  AN  ArMY  CAREER  AT  THE  TIME  OF  ENLISTMENT  IN 
CONJUNCTION  WITH  THEIR  REASONS  FOR  ENLISTMENT  SUGGESTS  THESE 
THREE  BASIC  CATEGOHIES;     RECRUITS  PLANNING  TO  SERVE  ONLY  ONE 
ENLISTMENT  (33  PERCENT  OF  ALL  FIRST-TERMERS) J  THOSE  ENLISTING 
WITHOUT  ANY  CONCRETE  IDEAS  CONCERNING  AN  ARMY  CAREER  (ABOUT 
^0  PERCENT  OF  THE  SAMPLE);  AND  THOSE  WHO  JOINED  INTENDING  TO 
MAKE  THE  ArMY  A  CAREER  (COMPRISING  ABOUT  20  PERCENT  OF  THE 

recruits) . 

Enlistment  reasons  selected  by  soldiers  were  grouped  into  four 
categories  to  facilitate  analysis:    (1)  enlistment  options/ 

INCENTIVES;  (2)  NO  PERSONAL  COMMITMENT;   (3)  PATRIOTIC  -  ArMY 
INTRINSIC;  AND  (^)  "OTHER".     As  SHOWN  IN  TaBLE  7.  AMONG  FIRST- 
TERMERS^  ENLISTMENT  OPTIONS/l NCENTI VES/  ACCOUNTI NG  FOR  ^1,8 
PERCENT  OF  ENLISTMENTS^  WERE  THE  MOST  COMMON  REASONS  SELECTED. 
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Of  those  choosing  enlistment  opttons/ incentives^  nearly  two- 
fifths  BESPONDED  TO  "GI  EDUCATIONAL  BENEFITS"  OR  "LEARNING 
A  SKILL  OR  TRADE  TO  USE  IN  CIVILIAN  LIFE"  AS  THEIR  PRIMARY 
ENLISTMENT  INDUCEMENT.    On  THE  OTHER  HAND^  THE  LARGEST  PERCENTAGE 
OF  CAREER  PERSONNEL  (^5.^  PERCENT)  INDICATED  THEY  HAD  ENTERED 

THE  Army  for  patriotic/Army  intrinsic  reasons;  this  percentage 

WAS  MARKEDLY  HIGHER  THAN  THAT  OF  26  PERCENT  FOR  FIRST-TERMERS. 

Such  reasons  included  service  to  the  country  and  the  chance 

FOR  adventure^  TRAVEL  AND  NEW  EXPERIENCES  (THESE  PARTICULAR 

reasons  accounting  for  just  over  one-third  of  careerist? 
enlistments). 

The  enlistment  reasons  of  first-termers  were  also  examined  based 

ON  THEIR  INITIAL  PLANS  TOWARD  AN  ArMY  CAREER.    As  DISPLAYED  IN 

Table  8-.  of  those  first-term  personnel  planning  to  serve  only 
one  term^  the  majority  picked  reasons  categorized  as  enlistment 

options/incentives  as  TKEIR  PRIME  MOTIVATORS  FOR  JOiNING  THE 

Army,   within  these  reasons.  "GI  educational  benefits"  and 
"Learning  a  skill  or  trade  to  use  in  civilian  life"  accounted 

FOR  approximately  50  PERCENT  OF  THIS  GROUP'S  ENLISTMENT.  FlRST- 
TERMERS  WHO  HAD  NO  REAL  PLANS  CONCERNING  AN  ArMY  CAREER  AT  THE 
TIME  OF  ENLISTMENT  TENDED  TO  JOIN  FOR  REASONS  CATEGORIZED  AS 
"Ho  PERSONAL  commitment"  (E.G..  TAKING  TIME  TO  GROW-UP.  GETTING 
AWAY  FROM  HOME  TOWN.  AND  NEED  FOR  A  JOB).     ReCRUITS  WHO  INITIALLY 
PLANNED  TO  MAKE  THE  ArmY  A  CAREER  WERE  MOST  LIKELY  TO  CITE  FACTORS 
ASSOCIATED  WITH  ENLISTMENT  OPTIONS/lNCENTI VES  AS  HAVING 


89 


CONTRIBUTED  MOST  TO  THEIR  JOINING.    HoWEVER^  THEY  WERE  ABOUT 
TWICE  AS  LIKELy  AS  EITHER  OF  THE  OTHER  TWO  GROUPS  TO  HAVE 
ENLISTED  DUE  TO  PATRIOTIC  OR  ArmY  INTRINSIC  REASONS. 

These  findings  suggest  there  is  a  need  for  additional  recruiting 

EMPHASIS  ON  HOW  THE  ArMY  CAN  CHALLENGE  INDIVIDUALS  (iN  TERMS 
OF  TRAINING^  SERVICE  TO  THE  NATION^  DISCIPLINE^  ADVENTURE^ 
AND  travel)  SINCE  THESE  COMPONENTS  HAVE  THE  POTENTIAL  FOR 
ATTRACTING  QUALITY  RECRUITS  WHO  ARE  FAR  MORE  LIKELY  TO  MAKE 

THE  Army  a  career.   On  the  other  hand^  the  widespread  use  of 
enlistment  options  and  incentives  beyond  training  and  education 
(e.g.^  unit-of-choice^  Army  area/station-of-choice^  cash  bonus) 
could  be  curtailed  or  eliminated  with  the  potential  for 
considerable  dollar  savings  as  well  as  increased  assignment 
flexibility. 

(b)  Separation 

PSTACAMC  CnO  CCDADAXtnM  UISTDC  ri  IICTCOCrn  TMTn  CTWSr 
I  ii»r^w  WI1  w    I  wi\    w  h.1  m\ri  I4wii    riwiiw    \#  t  %m9\%mU     A  M  i  w    I   A  V  fc. 

CATEGORIES:     (1)  ArMY  POLICIES/pROCEDURES/LIFEj  (2)  ONE-TERM 
OR  SHORT-TERM  MOTIVATIONS;  (3)  JOB  RELATED;  (^1)  PERSONAL  MOTIVA- 
TION; AND  (5)  "other".   As  shown  in  Table  9>  about  two-fifths 

OF  THE  first-termers  AND  CAREERISTS  WHO  DEFINITELY  PLANNED  TO 
SEPARATE  TENDED  TO  SELECT  FACTORS  ASSOCIATED  WITH  ArMY  POLICIES/ 
PROCEDURS/LIFE  as  having  most  influenced  their  DECISION  TO 
LEAVE  THE  ArMY.    ThEY  CTTED  THE  AMOUNT  OF  BUSY  WORK^  HARASSMENT^ 
AND  EXTRA  DUTIES;  AND  EXCESSIVE  CONCERN  FOR  HAIRCUTS^  APPEARANCE^ 
AND  DISCIPLINE  AS  THE  MOIST  IMPORTANT  REASONS  FOR  THEIR  INTENDED 
SEPARATION.     In  ADDIHONv  FIRST-TERM  SOLDIERS  ALSO  IDENTIFIED 


LOW  PAY  AND  ALLOWANCES  AS  AN  IMPORTANT  CAUSE  FOR  SEPARATION 
WHILE  CAREERISTS  NOTED  DISDAIN  FOR  THEIR  CURRENT  MOS  AND 
BEING  UNABLE  TO  GET  ONE  THEY  WANTED  AMONG  FACTORS  MOST 
INFLUENCING  THEIR  DECISIONS  TO  LEAVE  THE  ArMY, 

The  propensity  to  reenlist  among  first-term  soldiers  who  entered 
THE  Army  intending  to  serve  only  one  term  appear  to  be  unaf- 
fected BY  their  Army  experiences.  As  indicated  in  Table  iO. 

THEY  TENDED  TO  JOIN  TO  PURSUE  SPECIFIC  GOALS  (E.G.^  61 
EDUCATIONAL  BENEFITS)^  AND  HAVING  ATTAINED  THESE  OBJECTIVES 
PREFER  TO  SEPARATE,     ThOSE  INDIVIDUALS  WHO  JOINED  WITHOUT  ANY 
CLEAR-CUT  PLANS  TOWARD  AN  ARMY  CAREER  DECIDED  TO  SEPARATE 
BECAUSE  OF  EXCESSIVE  CONCERN  FOR  HAIRCUTS^  APPEARANCE  AND  DIS- 
CIPLINE AS  WELL  AS  THE  AMOUNT  OF  BUSY  WORK^  HARASSMENT  AND 
EXTRA  DUTIES,     PERCEPTIONS  OF  HAVING  VERY  LITTLE  "REAL  WORK" 
TO  DO  WERE  ALSO  RESPONSIBLE  FOR  INCLINATIONS  TOWARD  SEPARATION 
FOR  THIS  GROUP.    AmONG  FIRST- TERMERS  WHO  INITIALLY  DESIRED  AN 

Army  career^  low  pay  and  allowances  were  selected  as  contributing 
most  to  the  decision  of  soldiers  in  this  group  to  separate. 
Busy  work^  harassment  and  extra  duty  together  with. the  absence 
OF  "real  work"  were  also  frequently  cited  reasons. 

Only  two  job  or  work  related  factors  which  contribute  significantl 
to  a  separation  decision  (amount  of  busy  work,-  harassment,,  and 

extra  DUilESj  AND  TOO  LITTLE  "REAL  WORK"  TO  DO)  APPEAR  TO  BE 
addressable  BY  THE  ArMY.     ALTHOUGH  OBVIOUS^  PROVIDING  SOLDIERS 
WITH  INTERESTING  WORK  WHICH  CHALLENGES  THEIR  TALENTS  AND 


TRAINING  PROMISES  TO  CREATE  AN  ENVIRONMENT  MORE  CONDUCIVE  TO 
REENLISTMENT  (ALSO  INDICATED  IN  THE  AUGUST  1976  SURVEY).  In 
PARTICULAR^  AN  INCREASE  IN  MEANINGFUL  WORK  WILL  RAISE  OVERALL 
JOB  SATISFACTION^  HEIGHTEN  REENLISTMENT  INTENT^  AND  ULTIMATELY 
INCREASE  REENLISTMENT. 

IV.  The  April  1977  Pilot  Test 

A,  iNTROnilCTION. 

To  PROVIDE  THE  BEST  POSSIBLE  COVERAGE  OF  THOSE  FACTORS  WHICH 
COULD  BE  USED  TO  ASSESS  THE  INTER-RELATIONSHIPS  BETWEEN  REEN- 
LISTMENT DECISION^  UNIT  MORALE^  AND  JOB/cAREER  SATISFACTION^ 
A  PILOT  TEST  QUESTIONNAIRE  WAS  DEVELOPED  OVER  A  PERIOD  OF 
THREE  MONTHS.     ThIS  QUESTIONNAIRE  REPRESENTED  THE  TRAI«:ITI0N 
FROM.  THE  HerZBERG-BASED  APPRAOCH  UTILIZED  IN  THE  JOB  SATISFACTION 
PORTION  OF  THE  AOSP  TO  AN  ECLECTIC  APPROACH  COMBINING  THE  WORK 

OF  THE  Army  Research  Institute  (ARI)^  the  Air  Force  Human 
Resources  Laboratory  (AFHRL)^.  and  nILPERCEN.  Although  it  had 

BEEN  intended  TO  DEVELOP  AND  TEST  A  JOB  AND  CAREER  SATISFACTION 

bodel  wholly  within  milpercen^  because  of  time  and  manpower 
constraints  it  was  decided  to  capitalize  on  the  extensive 
literature  review  and  long-range  research  conducted  by  the 
AFHRL  on  job/career  satisfaction. 

Consequently  J  the  items  used  in  the  pilot  test  quest  innNAii  re 

WERE  DERIVED  IN  LARGE  MEASURE  ON  AN  OCCUPATIONAL  ATTITUDE 

Inventory  (OAI)  developed  by  the  AFHRL.   In  the  initial  bevelop- 

MENT  OF  THE  OAL  36  POTENTIAL  SATISFACTION  DIMENSIONS  OR 
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HYPOTHESIZED  FACTORS  WERE  IDENTIFSD  BY  AlE  FoRCE  BEHAVIORAL 
SCIENTISTS  FAMILIAR  WITH  THE  MILITAf  '  WORK  ENVIRONMENT.  ItEMS 
WERE  WRITTEN  FOR  EACH  DIMENSION^  RE  NG  IN  A  FINAL  POOL  OF 

3^8  ITEMS  (approximately  10  ITEMS  F        .MENSION)  WHICH  WERE 

validated  through  analysis  of  the        inses  by  a  random  sample 
of  about  3^000  first-term  airmen, 

For  use  in  an  Army  environment^  as  indicated  in  Table  11, 
32  of  the  hypothesized  factors  in  the  oa i  were  modified  while 

FOUR  NEW  FACTORS  WERE  ADDED  (ENTITLED^  "FaMILY"^  "INDIVIDUAL"^ 

"Discrimination"^  and  "Army  Unique") .    It  was  believed  that 
these  additional  factors  represent  important  influences  on  a 
person's  motivation  and  behavior.   Of  a  total  pool  of  32^  items 
selected  initially  in  pilot  test^  225  were  retained.  reduction 
of  the  number  of  items  was  based  on  the  following  criteria: 

(1)  Redundancy 

(2)  Reducing  the  excessively  large  number  of  items  in  the 

FACTORS  ENTITLED  "InDI VIDUAL"^  "HuMAN  SUPERVISION"  AND 

"Family". 

The  pilot  test  questionnaire  was  administered  to  approximately 

L60O  PERSONNEL  IN  APRIL  1977  AT  SIX  CONUS  INSTALLATIONS.  In 

addition^  about  600  soldiers  were  interviewed^  primarily  to 
provide  insights  into  the  content  validity  of  the  questionnaire 
and  clarity  of  instructions. 

The  FINAL  INSTRUMENT  WAS  REDUCED  FROM  225  TO  12^  ITEMS  THROUGH 
USE  OF  FACTOR  ANALYS IS/  STEPWISE  MULTIPLE  REGRESSION  ANALYSIS^ 


AND  A  SUBJECTIVE  REVIEW.    ThE  SUBJECTIVE  REVIEW  WAS  USED  TO 
ELIMINATE  DUFLICATIOM  WITHIN  EACH  OF  THE  HYPOTHESIZED  FACTORS 
AND  TO  ELIMINATE  ITEMS  JUDGED  TO  BE  OF  LITTLE  PRACTICAL  VALUE 
IN  TERMS  OF  JOB/ArMY  CAREER  SKTISFACTION^  UNIT  MORALE^  AND 
RETENTION  SUCH  AS  "YOUR  OPINIION  OF  THE  ArMY  COMPARED  TO  THE 

Air  Force".   The  12^  items  then  constituted  all  the  items 
COMPRISING  Section  B  of  the  comprehensive  Army-wide  Job  and 
Career  Satisfaction  survey  administered  in  November  1977. 
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TABLE  1 


I.    INDEPENDENT  FACTORS 
Work  Interest 


Work  Variety 


Opportunity  To  See  Work 

Work  Importance 
Amount  Of  Responsibility 

SSs^^In'^/J^^O  i^CREASE 

JOB  bKiLLs  And  Knowledge 

TSScI^^U^  Promotion. 
INCREASE  In  Job  Status 

SK?n  Training. 
oKiLLs.  Knowledge 

II.  CRITERION  MEASURES 

Satisfaction  With  Present  Job 
Reenlistment  Plans 


mimmi 

Wqrk  Conditions 
If^'^iy^TiEs.  Equi 


lOOLS 


PMENT 


feiTY  Of  Technical 
Supervision.  Received 

ilJL^S^T^G^  ^How  Job 
Janks  With  Other  Soldi 


er's 


ES 


Conflict  Of  Job  With 
Family  Res  pons  ibiu?i 

Exits''''''''' 

Army  Pay  (Base  Pay. 
Allowances.  Speci-al  Pay) 

Army  Benefits  (PX. 
Commissary.  Medical) 
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TABLE  2 


FACTORS  WITH  Wf^I^^E^'il^'J^fyj'go^^AREER  FORCE  PERSONNEL 


FACTOR 


Career 
Personnel 

CKANK) 


Chance  To  Help  Others  By 
Doing  Job 

Chance  To  Have  IJespqnsibility 
For  Seeing  A  Job  Through 
To  Completion 

Opportunity  To  Work  And 
Associate  With  People 
You  Like 

Job  Security 

Pride  Your  Family  Has  And 
gEC0GNi.TipN  Your  Family 
uivEs  10  TOUR  Job 

Availability  Of  On-Post 
Facilities 


Us 
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TABLE  3 


FACTORS  WITH  WHICH  FIRST-TERM  AND  CAREER  FORCE  PERSONNEL 

ARE  LEMI  SATISFIED 


FACTOR 

1ST  Term 
Personnel 
(Rank) 

1 

1  Career 
i  Personnel 
I  (Rank) 

1 

1 
1 

i 

The  Way  The  Army  Utilizes 
Enlisted  Personnel 

1 

1 

1 

The  Way  The  Army  Makes 
OsE  Of  Equipment^  Material^ 
Supplies 

2 

3 

Quality  And  Availability 
Of  Housing  (On  and  Off-Post) 

3 

2 

Standard  Of  Living  You 
Now  Have 

14 

"Red  Tape"  Associated 
With  Your  Job 

5 

Pride  Your^ Co-Workers 

nAVE  in  IOUR  UniT  AnD 

The  Army  1 

1 

5 
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TABLE  4 


"BEST"  FIVE  PREDICTORS 


IM  FOR  FIRST- 


FACTOR 


iEL 


Present  Duties  (Challenge^ 
Interest^  Importance) 

Chance  To  Acquire  Training^ 
ixPERiENCE^.  Skills^  and 
Knowledge  which  Can  .Be 
isEO  In  a  Civilian  Job 

Your  Supervisor's  Leadership^ 
Techni 

)KILLS 


lechnical  and  administrative 
Sk 


Chance  To  Help  Others  By  Doing 
Your  Job 

Amount  Of  Work  You  Have  To  Do 

Work  Schedule  (Total  Hours ^ 
Shifts^  Pace  Of  Work) 

Opportunity  To  Work  And  . 

rtj>j>0v.iAit  rtiih  rturut  lOu 

Like 


Chance  To  Have  Responsibility 
For  Seeing  A  Job  Through 
To  Completion 


4 
5 


'I 
5 
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TABLE  5 


''BEST"  FIVE  PRED 


FACTOR 


irst-Term   !  Career 
ersonnel  'personnel 


\  i\niir\/ 


Present  Duties  (Challenges^ 
Interest^  Importance) 

Army  Pay  and  Allowances 

In^General^  The  Things  The 
Becru ITER  Told  Me  About 
The  Army  Were  True 

The  Way  The  Army  Utilizes 
Enlisted  Personnel 

Doing  Work  Which  Bothers 
Your  Conscience 

Army  Policies  And  Procedures 

Pride  Your  Family  Has  And 
Becognition  Your  Family 
Gives  To  Your  Job 

Duty, Location 

Years  Of  Active  Federal 
Military  Service 


1 
2 

3 
^ 

s 


3 
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TABLE  6 


"BEST"  FIVE  PREDj^CJOg 


£  FOR  FIRST-TERM 


FACTOR 


:areer 

ir 


Pride  Your  Co-Wqrkers  Have 
In  Your  Unit  And  The  Army 

Unit  Policies  And  Procedures 
(Promotion^  Evaluation^ 
Leave^  Training; 

Quality  And  Availability 
Of  Eating  Facilities 

Training  Given  In  Your  Unit 

Amount  Of  Emphasis  On 
Equality  Of  the  Sexes 

Your  Supervisor's  Leadership^ 
Technical  And  Administrative 
Skills 

Availability  Of  Necessary 
Information  About  Unit 
Policies  and  Practices 
(Promotion^  Evaluation^ 
Leave ^  Training; 


3 


1 
5 
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TABLE  7 


1. 


PERCENT  OF  ENLI^^EgJ^|pyj;,  1 

:pGORY  FOR  FIRST-TERM 

FIRST-TERM 

CAREER 
PERgOiEL 

ENLISTMENT  CATEGORY/REASONS 

% 

% 

ENLISTMENT  OPTIONS  -  INCENTIVES 

'  .8 

23.1 

A.   To  Become  Eligible  For  GI 
Educational  Benefits 

19.3 

5.i| 

fl.   To  Learn  A  Skill/Trade  To 
USE  In  Civilian  Life 

17.9 

10.9 

c.   The  Training  Of  Choice 
Option  That  I  Wanted  was 
Available 

2.1 

2.^1 

Was  Available  To  Me 

1.1 

1.7 

E.   The  Army  Area/Station  Of 
Choice  Option  That  I  Wanted 
Was  Available 

1.1 

1.8 

F.   The  Unit  Of  Choice  Option 
That  I  Wanted  Was  Still 
Available 

0.3 

0.9 

NO  PERSONAL  COMMITMENT 

2ZxZ 

2L3. 

A.   To  Take  Time  Out  To  Find 
Myself^  Grow-up^  Mature 

1^.2 

10.6 

b.    I  Couldn/t  Get  a  Job  (Or 
A  Job  I  Wanted)  Anywhere 
Else 

5.9 

6.9 

c,   To  Get  Away  From  My  Home 
Town 

5.8 

5.^1 

•D,    I  Had  Friends  Jqining  The 
Army  Or  Already. In  The  Army 

1.3 

0.7 
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3.  PATRIOTISM  -  ARMY  INTRINSIC  2M. 

A.  The  Chance  For  Adventure^  , 
Traveu  And  New  Experiences  1^.^ 

B.  To  Serve  My  Country  6,9 

c,   I  Wanted  To  Be  A  Soldier  2,6 

D.   My  Family  Had  A  History 
Of  Army  Or  Other  Military 

Service  2,1 


15,1 
17.^ 
10,1 

2.8 


^,  OTHER  REASON  -  NOT  LISTED 
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TABLE  8 

PERCENT  OgYE^IilMMcMm^M-™ 


riU 

PLANS 

ENLISTMENT  CATEGORY/REASONS 

or 

% 

or 

% 

or 

1. 

ENLISTMENT  OPTIONS  -  INCENTIVES 

^2.8 

37.5 

A. 

To  Become  Eligible  For  61 
Educational  Benefits 

35.7 

19.2 

19.8 

B. 

To  Learn  A  Skill/Trade  To 
Use  In  Civilian  Life 

1^1.6 

R.2 

1^.6 

C. 

The  Training  Of  Choice 
Option  That  I  Wanted  Was 
Available 

1.0 

3.9 

1.5 

0. 

The  Enlistment  Cash  Bonus 
Was  Available  To  Me 

1.5 

1.6 

E. 

The  Army  Area/Station  Of 
Choice  Option  That  I  Wanted 
Was  Available 

13 

2.5 

1.6 

F. 

The  Unit  Of  Choice  Option  That 
I  Wanted  Was  Still  Available 

1.^ 

2. 

NO  PERSONAL  COMMITfiENT 

38.7 

a. 

To  Take  Time  Out  To  Find 
Myself^  Grow-Up^  Mature 

11.2 

1^1.1 

18.1 

B. 

I  Couldn't  Get  A  Job  (Or 
A  Job  I  Wanted)  Anywhere 
Else 

5.1 

7.7 

C. 

To  Get  Away  From  My  Home  Town 

5.6 

2.3 

10.7 

D. 

I  Had  Friends  Joining  The  Army 
Or  Already  In  The  Af^my 

1.6 

^  ■  ^ 

2.2 
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3.  PATRIOTISM  -  ARMY  INTRINSIC 

13. i| 

?9.2 

15.9 

A.   The  Chance  For  Adventure^ 

Trax/pi  .  Anh  Nfw  FypFRfFNrFQ 

9.8 

7.6 

13.0 

fi.   To  Serve  My  Country 

2.0 

7.8 

2.2 

c.   I  Wanted  To  Be  A  Soldier 

****  1  ^ 

7.1 

0.6 

D.   My  Family  Had  A  History 
Of  Army  Or  Other  Military 
Service 

1.6 

6.7 

1.1 

^1.  OTHER  REASON  -  NOT  LISTED  | 

1 

9.0 

7.5  1 

7.0 

i2Q 


:io.4 

o 
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TABLE  9 


immu 


AND  CAREER 


rhKoUNNhL 

rcRSONNEL 

SEPARATION  CATEGORY/REASONS 

% 

% 

1.    ARMY  POLICIES/PROCEDURES/LIFE 

57.9 

Ai     1   IHINK  IHERE  IS  100  rlUCH  CONCERN 

For  Such  Things  As  Haircuts^ 
Appearance^  And  Discipline 

9.8 

O  T 

0.3 

B.   The  Pay  And  Allowances  Are  Too  Low 

9.8 

6.3 

c,   The  Amount  Of  Busy  Work^  Harassment 
And  Extra  Duties 

9.6 

11.6 

d.    I  Don't  Like  My  MOS  And  I  Can't 
Arrange  To  Get  One  I  Do  Like 

6.9 

e,    I  Am  Not  Eligible  To  Reenlist 

1  It 

It  T 

F.    I  Don't  Think  My  Promotion  Chances 
Are  Too  Good 

m 

3.1 

6.    J  Couldn't  Get  The  Reenlistment 
Option  I  Wanted 

1.2 

0.^1 

H.    I  Was  Reclassified  Into  An  MOS 
That  I  Have  No  Interest  In  And 
Don  T  Enjoy  Working  In 

1.0 

3.3 

I.   The  Medical/Dental  Care  Is 
Inadequate 

0.3 

1.3 

2.  ONE-TERM/SHORT-TERM  MOTIVATIONS 

A, 


B. 


C. 


ERIC 


I  Joined  To  Become  Eligible  For 
GI  Educational  Benefits 

1  Did^Not^Intend  To  Serve  More 
Than  One  Enlistment 

I  Joined  To  Learn  A. Skill/Trade 
To  Use  In  Civilian  Life  And  I 
Have  Done  That 

•^'^  i 
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3ILZ 
9.7 
8.9 


0.7 

^.0 


I  Joined  The  Army  To  Have  A.,  , 
Chance  To  Find  Myself/Grow  Up/ 
Mature  And  I  ve  Done  That 

t\^7 

^1.9 

E. 

1  Joined  The  Army  For  Adventure/ 
Travel/New  Experiences  7\nd  I  ve 
Accomplished  These  Things 

1,0 

0.2 

3. 

JOB  RELATED 

2£L2 

A. 

I  Think  There  Is  Very  Little 
Real  Work"  To  Do  In  The  Army 

9.5 

5.9 

6. 

I  Spend  Too  Much  Time  Working 
Outside  Of  My  Primary  MOS 

3.8 

C. 

The  Army  Does  Not. Challenge  Or 
Demand  Enough  Of  Me 

2.2 

0. 

The  Duty  Hours  Are  Too  Long 
And/Or  Irregular 

1.9 

1.9 

E. 

I  Don't  Like  The  People  I 
Work  For 

1.9 

2.3 

'I. 

PERSONAL  MOTIVATIONS 

lA 

ILZ 

A. 

My  Wife/Husband  Wants  Me  To 
Get  Out 

3.1 

3.4 

B. 

I  Don't  Like  The  People  I  Have 
To  Associate  With 

2.2 

2.6 

C. 

My  Living  Conditions  (Housing/ 
Barracks)  Are  Poor 

1.2 

2.1 

0. 

The  Things^I  Can  Gain  From  A 
Second  Or  Subsequent  Enlistment 
(Job  Training^  Travel)  Are  Not 
Important  Enough  To  Me 

111 

5. 

OTHER  REASON-  NOT  LISTED 

 1 
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TABLE  10 


ERIC 


«  NO  1 
PLANS 

SEPARATION  CATEGORY/REASONS 

% 

% 

z  1 

i.  ARMY  POLICIES/PROCEDURES 

2?.9 

A.   I  Think  There  Is  Too  Much  Concern 
roR  Such  Things  As  Haircuts^ 
Appearance^  And  Discipline 

9.3 

7,0 

12.1 

Bi     IHE  KAY  And  ALLOWANCES  ARE  ToO  LoW 

10. 0 

21.1 

^.9 

c.   The  Amount  Of  Busy  Work^  Harassment^ 
MND  txTRA  Duties 

5.8 

17.3 

11.8 

D.    I  Don't  Like  My  MOS  And  I  Can't 
Arrange  To  Get  One  I  Do  Like 

2.3 

1.7 

5.^ 

E.   I  Am  Not  Eligible  To  Reenlist 

0.6 

1.^ 

1.5 

F,    I  Don't  Think  My  Promotion  Chances 
Are  Too  Good 

0.i| 

1.7 

2.2 

G.     1  LOULDN  T  bET  IHE  KEENLISTMENT 

Option  I  wanted 

0.5 

1.7 

2.1 

H.    1  WAS  Keclassified  Into  An  MOS 
That  I  Have  No  Interest  In  And 
Don't  Enjoy  Working  In 

. 

1.3 

2.3 

I.   The  Medical/Dental  Care  Is 
Inadequate 

"""" . " 

0.9 

2.  ONE-TERM  MflTTVATinN^ 

2L1 

A.   I  Joined  To  Become  Eligible  For 
61  Educational  Benefits 

15.2 

0.7 

6.7 

fi.   1  Did  Not_Ihtend  To  Serve  More 

Than  One  Enlistment  | 

16.2 

1.3 

2.7 

c.   1  Joined  To  Learn  A  Skill/Trade 
To  Use  In  Civilian  Life  And  I 
Have  Done  That 
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6.7 

2.2 

6.3 

D.    I  Joined  The  Army  To  Have  A. ,  , 
Chance  To  Find  Myself/Grow  Up/ 
Rature  And  Tve  Done  That 

5.i| 

0.5 

5.5 

El     1  JOINED  I  HE  ARMY  rOR  ADVENTURE/ 

Travel/New  Experiences /\nd  I  ve 
Accomplished  These  Things 

1.7 

1.3 

0.1 

3.  JOB  RELATED 

112. 

A.   IJhink  There  Is  Very  Little 
Real  Work  To  Do  In  The  Army 

9.2 

10.8 

9.3 

B.   I  Spend  Too  Much  Time  Working 
Outside  Of  My  Primary  MOS 

2.5 

6.8 

c.   The  Army  Does  Not  Challenge  Or 
Demand  Enough  Of  Me 

0.9 

5.1 

3.1 

D.   The  Duty  Hours  Are  Too  Long  And/ 

UR  IRREGULAR 

1  1 
1 . 1 

^  n 

9  n 

e.    I  Don't  Like  The  People  I  Work  For 

\A 

3.^ 

^.  PERSONAL  MOTIVATIONS 

7.2 

AA 

9.5 

A.   My  Wife/Husband  Wants  Me  To  Get 
Out 

lA 

i|.6 

3.5 

B.    I  Don't  Like  The  People  I  Have 
To  Associate  With 

1.6 

1 

3.5 

c.   My  Living  Conditions  (Housing/ 
Barracks)  Are  Poor 

1.6 

1  " 

1.5 

D.   The  Things^ I  Can  Gain  From  A 

oECOND  UR  oUBSEQUENT  tNLISTMENT 

(Job  Training.  Travel)  Are  Not 
Important  Enough  To  Me 

1.6 

i 

1.0 

5.  OTHER  REASON  -  NOT  LISTED 

JA 

5.0 

^30 

108 


TABLE  11 

AIR  FORCE  ARMY 


FACTOR  DESCRIPTOR 


iNlii 
IBIS 


FACTOR  DESCRIPTOR 


Achievement 

7 

Achievement 

2 

Activity 

8 

Activity 

Air  Force  and  Unit 
Policies  and  Practices  • 

18 

Army  .and  Unit 
tolicies  and  practices 

17 

AssiGNrefT  Locality 

17 

Assignment  Locality 

16 

Authority 

i| 

AumORITY 

3 

9 

WW  Fivrx l>t rw 

12 

Crpativity 

ID 

rRFATTVTTY 

Vrt\U/A  1  1  V 1  1  T 

5 

Importanpf 

8 

Importanpf 

2 

Intprf^t 

9 

Tmtcrf^T 

11 

l6jOWl_FTyy  CP  RfqJ  II  TQ 

7 

I^NTMI  FDGF  OF  Rf^ULT^ 

3 

PpR^TINAI    nRDUTTVI  AMH 
1  uAvHJnMLi  Ur\Unin  iyriU 

Deveudptcnt 

9 

Job  Design 

10 

Job  Design 

3 

/Optional  Social  ContactI 
|Required  SxiAL  Contact/ 

Social  Contact 

11 

Pay  and  Benefits 

12 

Pay  and  Benefits 

8 

FViYsicAL  Work  Environment 

13 

Physical  Work  Environment 

9 

R?OManoN  Opportunity 

8 

F^oMOTiON  Opportunity 

Reoognition 

9 

Recognition 

Responsibility 

10 

Responsibility 

Independence 

9 

Valje  of  Experience 

8 

Vauue  of  Military  Experience 

3 
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AIR  FORCE 

m 

BOOR  DESCRIPTOR 

FACIDR  DESCRIPTOR 

RivsicAL  Safety 

6 

ftfYsicAL  Safety 

3 

Zi 

FmjnMi/*  QcrnDiTV 
UwUnUriiw  OttUKilT 

9 

Service  to  Others 

8 

Service  to  Others 

1 

SxiAL  Status 

11 

Social  Status 

5 

Sufficiency  of  Training 

12 

Sufficiency  of  Training 

10 

SuPFj?vjsioN  Received  - 
Human  Reutions 

15 

Hlmaw  Supervision 

16 

Supervision  Received  - 

TprWNTCAL 

q 

TpruMf rAi  Q  iDPPv/Tcf riM 

iCwfTiiwiU  OUrCKViQiUM 

q 

J 

PIPRFQRMANCE  FVAIUATTHN 

PtPPfPM&MrP  PvAl  ItATIdM 

ruf%rvjnnnnuc  uymluhi  was 

Jrn  CuAHGP 

7 

h 
t 

T0D15*  F0UIPM=NT  AND 

Supplies 

8 

TfYH        Ff3llfQMPMT.  AMH 

Supplies 

7 

Utilization 

8 

Utilization 

3 

Variety 

9 

Variety 

4 

NbRK  Schedule 

15 

hbRK  Schedule 

6 

Supervisory  Duties 

18 

Supervisory  Duties 

10 

IkXASSIFIED 

8 

Individual 
Army  Unique 
Discrimination 
Family 

*i  It 

10 
9 
18 

M 

TOT^ 

2¥» 
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SECTION  2 
OCCUPATIONAL-TASK  ANALYSIS 
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PAPER  PBEaiflgfi'iilU  AT  THE  20TH  AMHUAL  CONFEBMCE  OF  THE  MILITARY  TESTING 
ASSOCIATION  1976 

OHGABISATION:    NAVAL  MANPOWER  UTILISATION  UNIT, 
HMS  VERNON,  PORTSMOUTH,  ENGLAND 

SUBJECT:  EXECUTION  OP  LARGE  OCCUPATIONAL  ANALYSIS  OP  THE  ROYAL 

NAV3r*S  OPERATIONS  BRANCH 

SPEAKER:  MR  C  I)  BEEL 


1.  INTRODUCTION  SLIDES 

Since  1971  BN  Occupational  Analysis  has  been  carried  out  by  the 
Naval  Manpower  Utilisation  Unit  (NMUU),    In  1975f  through  the  generous 
help  of  the  US  Navy,  the  use  of  the  CODAP  suite  of  computer  programs  was 
obtained.    Located  at  Portsmouth,  Hampshire,  the  NMUU  Is  an  outport  of  1 
the  Ministry  of  Defence  Naval  Manpower  and  Training  Department. 

It  Is  staffed  by  a  Commander  In  Charge,  3  Officers  and  12  Chief 
Petty  Officers  with  a  small  clerical  ataff • 

2.  THE  OPERATIONS  BRANCH 

Until  1973  the  various  non  technical  enlisted  men  of  the  weapons, 
sensor,  and  communications  operator  branches  of  the  RN  were  quite  sepa- 
rate with  their  own  structure  and  training  organisation. 

To  Increase  efficiency  and  co-ordination  in  the  modem  warfare 
environment  these  various  Independent  branches  were  merged  into  sub- 
branches  of  a  new  Operations  Branch.    This  was  to  match  the  radical 
changes  made  to  the  officer  structure,  including  the  introduction  of  the  ^ 
Principal  Warfare  Officer  trained  to  control  the  Integrated  fighting 
systems  of  a  ship.    The  School  of  Maritime  Operations  was  set  up  as  a 
common  faculty  for  Operations  Branch  and  Principal  Warfare  Officer 
Training. 

It  was  decided  to  conduct  an  occupational  analysis  of  the  11,000 
men  in  the  Branch  during  1977  to  see  whether  experience  gained  since  its 
fozmation  Indicated  any  need  for  adjustment  to  training,  duties,  and 
structure . 

There  were  several  underlying  reasons  for  the  survey.    Amongst  the 
most  Important  were:- 

a.  Concern  that  the  new  structure  might  lead  to  a  loss  of  deep 
sub-specialist  knowledge. 

b.  The  need  to  establish  how  well  the  Branch  was  coping  with  man- 
power shortages,  shorter  enlistment  engagements  and  Improved  sea  ^ 
shore  ratios. 

c.  Whether  further  streamlining  of  training  could  be  carried  out. 

d.  Concern  about  xretention  of  seamanship  skills  and  the  need  for 
research  dLnto  how  this  area  of  work  was  being  apportioned  between 
the  sub-branches. 
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3.    THE  SURVEY  OBJECTIVE 


SLIDES 


The  survey  objective  was  primarily  for  manpower  structure  and  plan- 
ning purposes  with  a  spin  off  for  Training  Design,    As  in  all  recent  HMM 
surveys  it  aimed  also  to  gather  attitudes  and  opinions  on  many  aspects  of 
Service  Conditions  and  Job  satisfaction,    it  was  decided  that  data  should 
be  gathered  not  only  from  the  Job  incumbent  but  also  by  a  secondary 
questionnaire  from  his  supervisors  and  managers. 

To  clarify  beforehand  what  specific  reports  should  be  derived  from 
the  data,  and  hence  the  questionnaire  structure,  the  directive  contained 
very  specific  primary  and  secondary  objectives. 

The  Survey  occupied  the  entire  resources  of  the  NMOU  and  a  consider- 
able expenditure  in  computer  processing  over  18  months.    A  73?^  sample 
(about  8500  men  afloat  and  ashore)  and  650  supervisors  responded  to 
their  respective  questionnaires. 

Because  of  time  constraints  the  remainder  o:C  my  talk  will  be  princi- 
pally concerned  with  the  main  survey  of  surface  Fleet  ratings  at  sea  and 
ashore. 

4.    QUESTIOMAIRE  CONSTRUCTION  ^ 

Information  for  the  task  inventory  was  gathered  from  every  possible 
source,  documentary  and  interview.    The  pilot  fact  finding  survey  sampled 
every  sub  branch  and  rate  to  cover  as  many  different  Jobs  as  possible, 
by  ship  class.    Over  5OO  people  were  interviewed  using  pre-planned  data 
forms  to  obtain  information  at  the  Job  (rather  than  task)  level  under  the 
broad  headings: 

Background  Information 

Billets 

Qualifications 

Ship  Employment  7 

Primsiry  Work  Area 

Secondary  Work  Area 

Work  Area  at  Different  Conditions 

General  Naval  Duties 

Seamanship  Topics. 

The  information  gathered  was  carefully  collated  and  integrated  with 
other  sources  of  data. 

Starting  initially  in  specialist  groups,  then  combining,  a  scalar 
diagram  of  all  tasks  of  the  Operations  Branch  was  built  up.    These  are  2 
examples.    Prom  this  was  derived  the  basic  task  inventory  to  which  were 
added  specific  questions  needed  to  satisfy  all  aspects  of  the  directive.  8 

To  meei  the  requirement  to  examine  common  operator  and  training  9 
areas,  the  decision  was  taken  to  create  one  raw  data  base  covering  all 
sub  branches  based  on  one  questionnaire  structure. 
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The  resiilt  was  a  fozmldable  sized  questionnaire  -  which  created  a 
dilemma.    In  practical  terms  we  doubted  whether  any  respondent  could  be 
asked  to  study  every  task  in  the  inventory  and  all  secondazy  questions 
without  losln^f  interest  and  producing;  dubious  results.    As  you  see  by 
this  ex^ple,  we  tried  to  keep  hlia  in  the  right  frame  of  mindi    But  we 
did  not  want  to  constrain  his  answers  into  specific  areas  because  of  the 
commonality  research  aspect.    Incidentally,  as  a  matter  of  policy,  the 
questionnaire  is  anonymous. 

In  the  event  a  compromise  was  used  on  the  task  inventory,  by 
subjectively  dividi:^^^  tasks  into  categories: 


This  enabled  us  to  limit  the  task  of  the  rospondsnt  and  hopefully  to 
achieve  the  objective.    Supporting  information  was  gathered  in  similar 
categories. 

5.  COMPUTER  FILE  CONSTRUCTION 

Two  sections  (Operational  Duties  and  the  task  inventory)  were 
incorporated  in  the  questionnaire,  both  covering  the  whole  man's  apport- 
ionment of  time  but  at  different  scalar  levels.    This  arrangement  could 
not  be  handled  by  CODAP  in  1  computer  file,  so  2  CODAP  files  were 
envisaged  and  designed  as  part  of  questionnaire  development.    This  carried 
with  it  some  secondary  benefits: 

a.  Operational  Duties  time  section  with  service  conditions/job 
satisfaction  data  only,  reduced  file  length  and  computer  tJjnes  for 
this  type  of  information. 

b.  Some  CODAP  internal  size  limits  could  be  side  stepped. 

c.  Operational  Duties  could  be  used  as  population  identifiers  in 
the  main  file  for  job  descriptions  from  the  task  inventory. 

6.  EXECUTION 

a.    Public  Relations 

Based  on  our  earlier  experience  and  because  the  questiozinaire  had  tc 
be  so  large,  and  because  our  population  extended  over  a  large  range  of 
I.Q.  and  ability,  an  extensive  publicity  campaign  was  adopted. 


Additionally  liaison  visits  were  made  to  as  many  as  possible  ships  and 
establishments  in  the  UK  by  job  analysts  from  the  NMDTI. 


c. 


a. 


Pure  fii)ecialist 

Common  Ship  Work 

Areas  of  Likely  Overlap. 
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b.    Distribution  and  return  of  Questionnaires 


SLUES 


The  despatch  of  questionnaires  and  their  subsequent  retimx  gave  to 
the  Unit  the  aspect  of  a  mall  order  business  for  the  period  June  -  1 ; 

September  1977.    Some  11,000  were  sent  out.  (photogpaplj 

In  the  event  some  Q^O  questionnaires  were  returned.  They  were  scruti- 
nised to  discard  bad  books.  (This  was  a  very  low  percentage,  less  than 
19^). 

Manual  coding  was  limited  to  allocating 

'l)    Case  Number  15 
l2)    Ship/Establishment  Code  (photograplji 
,3)    3  Digit  Sub  Branch/Rate 

More  refined  ship/establishment  coding  providing  various  types  of  class- 
ification  for  grouping  them  was  done  by  computer  program,  to  reduce 
human  error. 

c.  Data  Capture 

Data  capture  by  optical  mark  reading  would  have  been  preferred  but 
was  not  available  on  cost  grounds. 

16 

Key  to  disk  processing  was  used  at  the  ENs  Bureau  West  facility. 

Data  was  transferred  by  tape  to  the  computer  and  was  programmed  (as  one  (^^^'''^S^^Wl^ 
combined  operation)  to  cumulatively  build  a  SEA  and  SHORE  file  in  the 
following  stages  :- 

Coding  Checks  „ 
Refined  Coding  Additon  ^ ' 

Sorted  into  Sub  Branch/Sate  Order 

Merged- in  Sort  Order  into  File,   ^ 

Despite  all  the  checks,  some  -rogue'  cases  were  not  detected  at  this 
time  and  later  caused  problems  of  denigration  of  output. 

d.  Computer  Processing 
Final  file  statistics  were: 

SEA  5800      Each  of  59  card  images 

SHORE  2200       record  len^h 

SUBMARIME  820       14  Card  images. 

Processing  of  NMDU  CODAP  is  by  batch  mode  at  an  Army  Pay  Computer  and  is 
run  when  time  allows  between  primary  pay  processing.    The  size  of  the 
SEA  file  in  particular  created  elapsed  time  problems  whd.ch  had  not  been 
fully  anticipated.    Some  job  runs  went  to  8  hours  elapsed  time.  Many 
were  stopped  by  the  operator  because  of  other  requirements.    To  meet  this 
a  policy  of  splitting  work  had  to  be  adopted  to  enable  part  jobs  to  ran 
in  smaller  time  gaps  between  pay  runs  -  but  this  led  to  analysis  problems 
at  desk  level. 
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A  decision  to  produce  a  complete  packa^  report  Integrating  all  aspects 
for  each  work  axea  or  particular  group,  eg.  job  description,  Incumbent 
attitudes  to  training,  supervisor  opinion  on  training.  Incumbent  Job 
satisfaction,  etc,  meant  that  many  separate  computer  Jobs  had  to  be  run 
to  satisfy  primary  analysis  needs  for  one  report.    Often  the  report  was 
held  up  for  one  aspect  whose  computer  Job  was  waiting  in  line  behind 
many  more.    Allocating  priorities  became  the  name  of  the  game.  Keeping 
records  of  what  printouts  had  already  been  obtained  became  difficult. 

e.  Analysis 

Because  of  the  policy  of  integrated  reports  and  a  very  limited 
distribution  of  raw  printout,  the  entire  unit  work  force  has  been 
involved  in  the  production  of  reports,  for  the  best  part  of  1  year.  We 
would  like  to  make  a  more  direct  use  of  the  printouts  by  giving  them  a 
wider  distribution  to  our  'customers'.    But  people  generally  seam  to  be 
in  some  way  deterred  by  computer  prints,  and  find  them  difficult  to 
understand. 

Despite  the  difficulties  mentioned,  50  very  useful  reports  have  gone  to 
the  authorities  interested.    Here  axe  a  few  examples  to  Illustrate  the 
variety  and  scope. 

The  NMUU  can  only  point  out  significant  data  results  and  possible  conclu- 
sions, but  has  no  executive  authority  to  decide  what  needs  to  be  done. 
On  the  whole  this  is  thought  to  be  the  best  arrangement  for  a  management 
information  service  like  ours.    But  benefits  will  take  some  time  to 
appear  -  NMUU  reports  provide  only  one  contribution  to  management  decision. 

7.  CONCLUSION 

The  Operations  Branch  Suirvey  was  a  success  in  its  planning,  execution 
and  results.    The  CODAP  program  package  coped  easily  with  the  large  files 
and  did  all  that  was  expected  of  it.    Nevertheless  a  few  lessons  were 
learned:- 

a.  Big  is  not  necessarily  beautiful.    The  sheer  size  of  the  Job 
created  many  problems  for  a  small  Unit. 

b.  Mixing  aims^ -^Manpower,  Training,  Job  Sat  seems  attractive  in 
terms  of  a  single  visitation  to  the  Fleet.    But  the  value  of  results 
is  downgraded  by  incompatibility  of  aims. 

c.  NMUU  Staff  suffered  from  lack  of  variety  of  work  during  a  long 
analysis  period.    Some  members  Joined  after  the  survey  started  and 
barely  saw  the  end  of  it.    One  could  say  that  our  own  Job  satis- 
faction suffered  a  little! 

Thankyou  gentlemen.    Before  attempting  to  answer  any  questions  may  I  say 
how  much  the  Royal  Navy  and  my  Unit  appreciate  the  privilege  of  attending 
this  annual  meeting  where  so  many  experienced  authorities  in  the  field 
axe  convened. 
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TO  NMUU  DIRECTIVE 

SECQWDARY  PURPOSE  WHERE  POSSIBLE.  DURING  THE  SURVEY.  iNFORfvlATiON  iS  TO  BE 
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I    ATTITUDES  TO  ADVANCEMENT 

n.  THE  EMPLOYMENT  OF  THE  OPERATIONS  BRANCH  CO  ORDINATOR. 
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AND  CRITERION  DEFINITION 
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While  the  technical  literature  pertaining  to  Independent    measures  (such 
as    aptitude    tests,    vocational    interest    inventories,    and    so    on)  is 
burgeoning,    much  less    time  and    attention     is    paid    to    advancing  the 
technology    of    dependent    or  criterion  measures.    One  reason  that  useful 
approaches  for  handling  the  "criterion  problem"  have  been  slow  to  evolve 
is    that  procedures  required  to  surmount  certain  technical  aspects  of  the 
problem  have  yet  to  be  developed,    or    are    not    widely    known.  Another 
reason    is    that,  although  relevant  techniques  for  handling  other  aspects 
of  the  problem  have  been  published.  Insufficient    systematic    effort  has 
been    expended    to    Integrate  them  into  prcictical  research  strategies.  A 
research  strategy  using  nonmetric  multidimensional  scaling  was  developed 
to    fill    in  some  of  these  practical  technological  gaps.    This  was  tested 
on    Air    Observers    (operators    of    complex    sensor    and  communications 
equipment  used  in  antisubmarine  and  Northern  surveillance  aircraft  in  the 
Canadian  Forces) .    The  content  dimensions  produced    in    this  application 
proved:    (a)    highly  reliable  and  internally  consistent  within  relatively 
homogeneous    groups    of    individuals;      (b)    readily     and  meaningfully 
generalizable     across    a    variety    of    work    situations  (responsibility 
levels);     (c)    valid  in  terms    of    showing    significant    relationship  to 
external    variables,    and    being  readily  integrated  into  larger  bodies  of 
scientific    knowledge ;      and ,    (d )    extendible     in     theoretically  and 
practically    important    ways    in    other    studies*      A    more  comprehensive 
treatment  of  the  results,  discussion,  and  conclusions  deriving  from  this 
research    programme  is  available  on  request  from  the  author.    The  present 
paper  focuses  specifically  on  the  design  and  analytic  methods  used,  since 
it    is  believed  that,  as  a  general  research  strategy,  they  have  relevance 
for    those    involved    in    task     analysis     and     criterion  definition, 
particularly    in  hiynan  factors  engineering,  test  and  training  validation, 
and  performance  evaluation  applications. 
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In  theoretical  and  applied  psychological  research  one  is  often 
faced  with:  (^)  defining  what  dependent  or  criterion  measures  are  likely 
to  be  important  in  specific  content  areas,  and  (b)  developing  Procedures 
to  collect  T*eiiable,  valid  data  reflecting  these  dimensions  once  they 
have  been  defined.  However,  while  the  technical  literature  pertaining  to 
Independent  ineasurea  (such  as  physiological  indicators;  both  written  and 
other  expressions  of  aptitudes,  vocational  Interests,  personality, 
attitudes,  or  job  performance,  and  so  on),  is  burgeoning,  less  attent+on* 
has  beexi    devoted  to  advancing    the    technology    of    d^iipe^ent  or 

criterion  measures.  This  is  true  of  research  having  to  do  with  selection 
and  classification,  training  and  education,  performance  evaluation,  human 
factors,  human  and  organizational  development,  and  other  areas  of 
psychology  where  a  sound  knowledge  of  the  performance  content  domain  with 
which  one  is  dealing  should  be  the  basic  starting  point  for  subsequent 
research*    As  Christensen  and  Mills  (1957)  point  out: 


The  criterion  problem  is  much  like  the  weather  - 
all  psychologists  tr»lk  about  it  but  very  few  do 
luuch  dbout  it.  And  yet  its  central  importance  is 
dispt^ted  by  no  one.  Over  twenty  years  ,  ago 
Thorridike  (Note  1,  p.  29)  attested  to  its 
importance  in  military  operations  when  he  said, 
"Certainly  the  most  funda-montal  and  probably  also 
ti;e  roost  difficult  problem  in  the  Aviation 
Psychology  Program  was  that  of  obtaining 
satisfactory  criterion  ineasurea  against  which  to 
validate  tests  and  evaluate  variations  of  training 
methods'",    (p.  335). 


A  numher  of  papers  have  recently  been  published  about  various 
aspects  of  the  "criterion  problem"  (e.g.,  Christensen  &  Mills,  1967; 
Dunnette,  1963;  Inn,  Hulin  &  Tucker,  1972;  Crooks,  (Note  2).  This  work 
has,  as  yeti  fijiled  to  produce  many  concrete  solutions.  The  discussions 
have  generally  been  more  useful  in  defining  various  aspects  of  the  problem 
than  in  demonstrating  how  the  1'  might  be  handled. 


In  terms  of  what  task  analysis  should  be,  or  what  it  should 
accomplish.  Miller  (1953)  has  argued  that  task  analysis  should  involve  the 
systematic  study  of  the  behavioural  requirements  of  tasks.  Gagne  (1963) 
has  suggested  that  it  should  allow  inferences  based  on  the  Knowledge  of 
human  functions  concerning  what  kinds  of  abilities,  skills  and  knowledge 
are  required  in  order  for  a  human  being  to  carry  out  specific  tasks. 
Kershner  (Note  3)  has  indicated  that  job  analysis  should  answer  the 
"what",  "How''  and  "why"  of  tasks. 
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Amraerman  (Note  4),  on  the  other  hand,  has  been  a  bit  more  explicit, 
suggesting  that  task  analysis  should:  (a)  yield  an  organit^ational  scheme 
accounting  for  all  previous  knowledge  of  relevant  job  activities,  (b) 
Identify  and  account  for  all  activities  relevant  to  the  specific  Job,  (c) 
take  into  accovnt  and  be  consistent  with  psychological  concepts  of  human 
behaviour,  and  hopefully,  be  generalizable  to  a  range  of  jobs,  and  (c)  be 
temporarily  practical  and  meaningful  to  users.  Dumas  and  Muthard  (Note  5) 
have  argued  that  task  analysis  should  also  offer:  (a)  cliassification  of 
tasks  in  situ  to  minimize  the  introduction  of  errors,  (b)  measurements 
that  are  reliable  and  on  interval  scales  insofar  as  is  possible,  and  (c)  a 
methodology  which  is  compatible  with  appropriate  system  analytic  and 
operations  ret^oarch  techniques  so  that  critical  decisions  made  about 
specific  aspects  of  the  job  can  be  simultaneously  relijted  to  other 
important  data  elements. 


It  is  difficult  to  ar-gue  with  these  lists  of  desirable  task 
analysis  charactoristics  and  objectives.  In  a  sense  they  have  the  glow  of 
motherhood.  Unfortunately,  in  themselves,  they  do  not  imply  how  these 
ends  are  to  be  achieved.  This  fact  notwithstanding,  the  points  raised 
were  regarded  as  desirable  goals  for  the  task  analysis  procedures  outlined 
in  succeeding  sc^ctions. 


Task  Descriptinn  Versus Task  Analysis 


Most  conventional  task  analysis  strategies  have  been  limited  to  the 
use  of  specific  data  gathering  procedures  in  conjunction  with  a  rational 
taxonomy.  The  intent  in  th  se  studies  is  to  classify  task  elements 
according  to  psychological  constructs  reflecting  the  particular 
theoretical  predilections  of  the  investigators. 


Breaking  a  job  down  into  a  number  of  reasonably  elementary 
components  and  then  rationally  classifying  these  according  to  some  scheme 
can  be  useful  as  a  first  step  in  a  larger  program.  This  process  does  not 
go  much  beyond  what  Killer  (I963)  calls  "task  description",  in  addition 
to  these  preliminary  data  collection  and  organizing  phases,  one  requires 
means  for  obtaining  a  behavioural  understanding  of  the  task  requirements. 
Miller  has  reserved  the  term  "task  analysis"  for  this  latter  process. 

Fleishman  (1967a,  1?6.7c)  and  Finley,  Obermayer,  Bertone,  Meister 
and  Muckler  (Note  6)  have  argued  that  investigators  must  strive  to  move 
beyond  the  mere  identification  and  classification  of  discrete  task 
elements  in  specific  work  settings  to  the  distillation  of  a  relatively 
parsimonious  set  of  unifying    fundamental    behavioural    elements  gathered 
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ft*om  a  number  of  settings.  The  Fleishman  approach  has  tended  to  involve 
examination  of  various  aspects  of  performance  in  laboratory  settings. 
This  has  produced  some  very  useful  information  but  cannot  avoid  suffering 
from  a  certain  amount  of  artificiality  since  it  ignores  (and  in  many  ways 
is  designed  to  eliminate)  contextual  factors.  The  importance  of  the 
context  in  which  tasks  are  performed  is  well  recognized  (see  discussions 
by  Alluisi,  1969;  Christensen  &  Mills,  1967;  Grodsky,  1967;  Miller, 
1963;  Prien  &  Ronan,  1971),  and  any  deemphasis  of  it  could  not  help  but 
constitute  a  major  weakness  of  this  approach.  Finley  et  al.  (Note  6) 
have  argued  for  the  identification  of  "fundamental  behavioural  dimensions" 
underlying  tasks  identified  in  the  "man-machine"  environment,  but  after 
conducting  a  fairly  comprehensive  review  of  the  literature  were  unable  to 
suggest  how  this  might  be  done. 


It  was  felt  that  multidimensional  scaling,  in  conjunction  with 
regression  analysis,  and  allied  multivariate  techniques  might  be  suited  to 
Ihe  kinds  of  analyses  called  for  in  the  preceeding  section.  In  general, 
given  a  matrix  of  numbers  showing  how  similar  each  object  in  a  set  is  to 
each  of  the  remaining  objects  in  the  same  set,  the  goal  of 
multidimensional  scaling  (MDS)  is  to  determine  the  minimum  dimensionality 
of  the  relationships  as  well  as  the  projections  or  scale  values  of  the 
objects  on  each  of  the  resulting  dimensions.  This,  of  course,  is 
precisely  what  one  would  like  to  do  in  task  analysis. 


MDSCAL  (Kruskal,  196^a,  196^b)  and  other  nonmetric  multidimensional 
scaling  algorithms  require  input  in  the  form  of  stimulus  by  stimulus 
similarity  (proximity)  matrices  showing  but  ordinal  interrelationships 
among  the  stimulus  objects  under  study.  For  these  data,  the  algorithms 
attempt  to  derive  a  representation  of  n  points  (representing  the  objects) 
in  a  geometric  space  of  smallest  dimensionality  such  that  the  original 
proximities  (let  these  be  represented  by  B^j ) ,  and  the  final  geometric 
interpoint  distances  (let  these  be  represented  by  d^j )  are  related 
monotonically.  That  is,  so  the  geometric  interpoint  distances  d£j<di^i 
when  the  similarities  Bij >Bki  (if  the  Bs  are  dissimilarities,  one  requires 
Bij <Bkl)- 

The  analysis  proceeds  through  a  series  of  successive  iterations. 
One  starts  with  an  arbitrary  initial  configuration  (of  known 
dimensionality  in  n  points)  which  may  be  randomly  generated;  a  "best 
guess"  on  the  part  of  the  investigator,  or  created  in  a  number  of  other 
ways  (e.g.,  the  Young/Torgerson  option  used  in  the  computer  programme  KYST 
-    see    Kruskal,    Young    4  Seery,    Note    7).      Starting    from  this  initial 
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configuration  the  n  points  are  adjusted  mathematically  such  that,  in  a 
space  of  specified  dimensionality,  their  distance  interrelationships  (dij ) 
more  and  more  closely  reflect  the  monotonic  (ascending  or  descending) 
interrelationships  of  the  respective  P:j,  The  procedure  continues  until 
one  of  a  number  of  criteria  has  bf  n  met  which  indicates  (either 
absolutely,  or  in  a  practical  sense)  that  no  more  improvement  in  the 
solution  is  possible.  The  values  of  these  criteria  may  be  specified  by 
the  investigator,  and  relate  to  the  number  of  iterations  conducted,  how 
fas*;  the  solution  is  converging,  or  how  well  the  monotonic  d.-^  vs  B- • 
requirement  is  met^ 


The  index  of  how  well  the  monotone  relationship  between  the  Bijs 
and  dijs  is  met  in  a  particular  iteration  has  been  referred  to  as  stress 
(Kruskal,  195^a).  A  zero  stress  value  indicates  that  a  perfect  monotone 
relationship  exists  between  the  dissimilarities  and  final  fitted  dijs. 
Hampton  (Note  8)  presents  a  more  extensive  conceptual  discussion  of  this 
analytic  model.  One  generally  conducts  separate  analyses  on  the  same 
data,  in  a  number  of  dimensionalities.  One  then  chooses  among  the 
separate  solutions  on  the  basis  of  goodness  of  fit  (low  stress),  parsimony 
(adequate  representation  in  fewest  dimensions),  and  interpretability  (the 
solution  should  make  sense). 


One  might  question  the  appropriateness  of  using  data 
interrelationships  reflecting  only  ordinal  qualities  of  measurer.-nt  to 
generate  a  metric  configuration.  In  discussing  the  rationale  underlying 
nonmetric  MDS,  Shepard  (1962)  has  argued  that  knowledge  of  ordinal 
relationships  of  distances  really  implies  much  stronger  than  ordinal! 
measurement  when  the  points  to  which  the  distances  refer  are  considered  in 
the  context  of  a  configuration  of  known  dimensionality.  Further,  the 
greater  the  ratio  of  nanibers  of  points  to  numbers  of  dimensions,  the  more 
finitely  the  final  configuration  can  be  determined. 


Method 


Participants 


Participants  in  this  research  project  were  either  members  of  the 
Air  Observer  trade  in  the  Canadian  Forces,  or  individuals  having  an 
intimate  working  knowledge  of  it.  The  Air  Ooserver  is  the  primary 
operator  of  sophisticated  sensing  and  communications  systems  on  military 
ocean  and  Northern  surveillance  aircraft.  To  gain  entry  to  this  trade,  an 
individual  must  have  been  trained  and  have  a  good  record  in  another  trade 
in  the  Canadian  Forces.    He  must  also  have  achieved  a  minimum  standard  on 
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a  test  of  general  learning  ability.  Then,  if  selected  in  competition  with 
others  meeting  these  criteria,  the  individual  undergoes  a  demanding 
programme  of  aircrew  training.  On  the  job  itself,  the  Air  Observer  must 
remain  vigilant  while  monitoring  equipment  over  long  periods  of  time. 
These  periods  are  interspersed  with  sessions  of  rather  Intense  and 
critical  activity. 


Specific  samples  in  this    research    programme    included:      (a)  two 
groups  each  of  28  Air  Observers  (henceforth  referred  to  as  OBSl  and  0BS2), 

(b)  21  Air  Observers  (referred  to  as  SUPS)  holding  supervisory  positions, 

(c)  five  students  (referred  to  as  STUDS)  undergoing  final  stages  of 
qualification  training,  and  (d)  eight  commissioned  officers  (referred  to 
as  ROS)  with  extensive  experience  in  operations,  operational  training,  and 
staff  capacities  associated  with  the  trade. 


Tas^  Definition 


As  a  first  step  in  defining  the  content  domain  for  further  study, 
training  manuals  and  checklists  covering  the  range  of  Air  Observer  duties 
in  the  Argus  long  range  patrol  aircraft  were  formulated  into  task  elements 
and  classified  according  to  the  Berliner,  Angell  and  Shearer  (Note  9) 
taxonomy. 


As  a  cross-check,  the  task  elements  were  reviewed  for  completeness 
and  independently  categorized  according  to  the  taxonomy  by  instructors  at 
the  Maritime  Operational  Aircrew  Training  Squadron,  Canadian  Forces  Base 
Greenwood,  Nova  Scotia.  These  individuals  were  well  acquainted  with  all 
aspects  of  the  Air  Observer  job,  since  each  had  many  hundreds  of  hours 
experience  with  it,  both  in  training,  and  in  operational  capacities.  The 
information  from  this  step  was  compared  to  that  from  the  former  one. 


In  a  third  iteration  of  the  procedure,  the  task  elements  generated 
in  the  former  two  steps  were  categorized  according  to  the  taxonomy  by 
senior  operational  personnel  at  Canadian  Forces  Base  Greenwood.  These 
results  were  compared  to  the  composite  of  the  former  two  steps.  In  each 
of  the  separate  applications  of  the  taxonomy,  when  discrepancies  were 
found,  these  were  resolved  by  negotiation  with  representatives  of  the 
various  groups  involved.    In  most  cases,  consensus  was  easily  achieved. 

Finally,  the  author  and  two  colleagues  flew  several  operational 
training  missions  with  Argus  crews.  The  purposes  of  these  flights  were  to 
offer  an  intuitive  idea    of    some    of    the    contextual    and  environmental 
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circumstances  in  which  the  tasks  are  performed. 


The  procedures  described  above  produced  more  than  350  task 
elements,  which  were  far  too  many  to  be  handled  by  fexi sting.  MDS 
strategies.  Further,  the  elements  differed  in  level  of  abstraction  (a 
list  of  the  elements  is  presented  in  Hampton,  Note  8),  In  an  attempt  to 
come  to  grips  with  these  problems,  the  total  lisl  was  reviewed  with  the 
help  of  an  officer  with  extensive  operational  experience  (more  than  25QQ 
hours  in  the  Argus  aircraft),  and  reformulated  into  166  task  functions. 
These  functions  were  generated  so  that  all  were  at  about  the  same  level  of 
abstraction,  and  were  couched  in  phraseology  and  jargon  that  would  be 
readily  understood  by  the  Air  Observer.  Three  statements  were  added  to 
this  list  on  the  basis  cf  pilot  work  with  the  experimental  procedures. 
The  resulting  list  of  169  task  functions  is  presented  in  Hampton  (Note  8). 


Materials 


The  materials  assembled  for  each  participant  involved: 


1.  One  or  two  decks  of  computer  cards,  each  containing 
169  cards  on  the  top  of  which  were  printed  the  169 
task  functions  (one  to  a  card).  Each  card  also  had 
a  unique  identification  code  punched ,  but  not 
printed  in  columns  72-80. 


2.  One  or  two  white  computer  cards  on  which  were 
printed  spaces  for  identification  and  other 
pertinent  information. 

3»  One  or  two  decks  of  16  blue  pile-separator  cards 
each  containing  one  of  the  numbers  from  1  -  16. 


^.  One  or  two  sheets  of  paper  on  which  all  possible 
pairings  of  the  numbers  from  1  -  15  (a  number  was 
not  paired  with  itself)  were  arranged  in  random 
order,  making  n(n-1)/2  =  105  pairs. 


5*  A  booklet  containing  all  task  statements. 
Accompanying  each  of  these  booklets  were  three  sets 
(of  a  possible  seven)  of    five    point    scales.  The 
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full  set  of  variables  were:  (a)  degree  of 
concentration  required  in  performing  the  task,  (b) 
difficulty  level  of  the  task,  (c)  manual  skill 
required  in  performing  the  task,  (d)  the  importance 
of  the  task  to  successful  completion  of  missions  in 
which  it  is  typically  performed,  (e)  the  cooperation 
or  teamwork  generally  required  to  complete  the  task, 
(f)  the  importance  of  speed  or  working  quickly  to 
successful  completion  of  the  task-  anH  (g)  the 
degree  of  mental  effort  (decision  making, 
calculating ,  memory ,  and  planning)  required  to 
successfully  complete  the  task,. 

The  duplicate  materials  alluded  to  in  points  1  -  5  above,  were  used  in  a 
test  -  retest  reliability  study  of  the  sorting  task  described  below. 

Experimental  Design 

Pilot  work  had  suggested  that  order  effects  might  be  important  in 
the  presentation  of  the  task  statements  and  answer  sheets^,  Therefore,  as 
a  first  step  in  incorporating  a  partial  balance  into  the  presentation  of 
the  task  statements  to  participants,  the  following  four  blocks  of 
statements  were  created:  (a)  thirty  items  relating  to  antisubmarine 
warfare,  (b)  fifty  items  relating  to  electronic  counter-measures  and 
communications,  (c)  forty-nine  items  relating  to  detection  functions,  and 
(d)  forty  items  relating  to  the  use  of  RADAR.  Tliese  blocks  were 
independently  organized  into  two  four-by-four  Utin  squares.  One  of  these 
was  used  for  balancing  the  presentation  of  items  in  the  card  deck,  and  the 
other  was  used  for  presenting  the  items  in  the  task  statement  booklet.  To 
control  for  order  effects  in  presentation  of  the  three  rating  scales  to  be 
used  by  each  person  (time  constraints  dictated  that  all  seven  sets  of 
scales  could  not  be  done  by  each),  these  were  arranged  in  a  Youden  square 
design. 

The  design  precautions  outlined  above  provided  four  different  task 
booklet  combinations,  four  different  card  deck  combinations,  and  seven 
different  answer  sheet  combinations.  As  an  additional  control  for  any 
interaction  between  book  type  and  answer  sheet  presentation,  the  book  and 
answer  sheet  combinations  were  arranged  in  blocks  of  twenty-eight  (i.e.,  ^ 
books  X  7  answer  sheet  combinations  =  28)  so  that  each  answer  sheet 
combination  was  paired  with  each  of  the  book  types.  Within  each  block  of 
twenty-eight  book/answer  sheet  combinations,  the  four  deck  types  were 
assigned  so  that  seven  of  each  type  were  randomly  represented  in  each 
block. 
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Integers  from  one  to  twenty-eight  were  then  randonily  assigned  to 
the  book/answer  sheet/deck  combinations.  These  numbers  represented  the 
order  of  presentation  of  specific  treatment  combinations  to  participants. 
Four  separate  blocks  were  created  in  this  way  with  assignment  of 
participants  to  treatments  being  independently  randomly  assigned  in  each.. 
The  first  two  blocks  of  this  design  were  reserved  for  the  0BS1  and  0BS2, 
while  the  third  block  was  reserved  for  the  first  twenty-eight  SUPS  and 
BOS.  The  remaining  participants  (the  STUDS  and  others)  were  assigned  in 
order  to  treatment  combinations  in  the  fourth  block. 


Procedupe 


The  experimental  procedure  followed  is  outlined  iu  more  detail  in 
Hampton  (Note  8),  but  basically  consisted  of  having  participants  sort,  and 
thpn  subsort  the  piles  of  task  statements  on  the  basis  of  the  similarity 
they  felt  existed  in  the  performance  of  the  functions  on  each  of  the 
cards.  Free,  but  not  completely  unconstrained  sorting  was  used,  in  that  a 
maximum  of  15  piles  were  to  be  used  in  the  major  sorts  (with  an  additional 
"miscellaneous"  pile  to  be  created  only  if  absolutely  necessary),  with  a 
maximum  of  five  to  be  used  in  the  subsorts  (again,  with  an  additional 
sixth  pile  trais  to  be  vfied  if  required).  The  maximum  numbers  of  piles 

for  both  the  sorts  and  subsorts  were  chosen  on  the  basis  of  what  a  rather 
extensive  pilot  project  suggested  were  more  than  required  by  most 
participants. 


Between  the  sort  and  subsorting  stages,  participants  were  asked  to 
serially  number  all  of  the  major  piles  which  they  had  placed  on  the  table 
in  front  of  them.  On  a  sheet  of  paper  containing  105  scales  each  with  the 
headings  "Category  X",  "Category  Y",  (where  X  and  Y  stood  for  the  numbers 
attached  to  the  categories)  and  the  n^jmbers  from  1  to  5,  participants 
rated  the  similarity  of  all  possible  pairings  of  the  constructs  reflected 
in  each  of  the  piles. 


After  the  above  steps  were  completed,  a  booklet  containing  all  task 
statements  was  distributed  to  each  individual.  Inserted  inside  the  back 
cover  of  the  booklet  were  three  sheets  of  defined  scales  to  allow  rating 
of  each  task  function  on  the  variables:  (a)  Concentration  Required,  (b) 
Difficulty,  (c)  Manual  Skill  Required,  (d)  Importance,  (e)  Cooperation  or 
Teamwork  Involved,  (f)  Speed  Required,  and  (g)  Mental  Effort  Required. 
Taking  each  of  the  three  sheets  of  paper  separately,  participants  were 
instructed  to  go  through  the  task  statement  booklet  three  times,  and  to 
rate  each  task  according  to  the  variables  defined  on  the  respective  sheets 
of  p'iper. 
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After  all  data  were  collected,  a  card  containing  biographical 
formation  was  keypunched  and  concatenated  with  the  pile  rating  and  other 
ta  described  above  to  form  the  test  data  set  for  each  person.  A 
mputer  programme  assigned  proximity  indices  between  task  statements  for 
ch  individual  as  follows:  (a)  if  two  statements  were  not  grouped 
gether  they  received  a  proximity  index  of  6  -  X;  where  X  was  the 
nilarity  rating  assigned  their  respective  major  categories,  (b)  if  two 
atements  were  grouped  together  after  the  first  sort  but  not  after  the 
cond,  they  received  a  conjoint  score  of  7,  and  (c)  if  two  statements 
re  grouped  together  after  the  final  sort  they  were  assigned  a  score  of 
Stimulus  by  stimulus  half  matrices  (without  diagonal)  having  (169  x 
69  -  1)  1/2  =  14,196  similarity  estimates  as  entities  were  thus  produced 
r  each  individual. 


Analyses  and  Brief  Discussion  of  Hejiijlj^ 


Liabilitv  o£  ijTe  Sorted  Indices 


Thirty-one  of  the  journeyman  (0BS1  and  0BS2)  Observers  repeated  the 
•ting  and  pile  rating  stages  for  a  test-retest  reliability  study.  Each 
lividual  received  the  same  combinations  of  materials  in  both  sessions 
fept  that  on  retest,  the  unidimensional  ratings  were  not  done. 


A  more  comprehensive  Justification  for  the  use  of  sort-generated 
►ximity  indices  as  input  for  multidimensional  scaling,  their 
lability,  and  validity,  is  presented  elsewhere  (Hampton,  Note  8). 
Tice  to  mention  here  that  dissimilarity  indices  produced  by  taking 
.thmetic  means  across  individuals  in  each  of  the  test  and  retest 
isions,  and  then  computing  a  Pearson  t  down  the  respective  aggregate 
xifflity  indices  between  sessions,  produced  a  correlation  of  .94,  thus 
icating  considerable  retest  reliability. 


Thirty  of  the    individuals    who    participated    in    the  test-retest 
lability    study    were    divided    into    groups  on  an  even-odd  basis.  Two 
regate  proximity  matrices  were  created  by  taking    arithmetic    means  of 
dissimilai'ity  indices  across  individuals  in  each  group.    A  correlation 
computed  down  the  respective  aggregate  proximity  indices  in    the  test 
retest  sessions  producing  within-group  consistency  correlations  of  .83 
.83 I    respectively.     These    values    give     further     evidence  that 
siderable    consistency    existed    in    the  way  that  different  individuals 
[Reived  the  task  statements. 
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Nonlinear  MPS  Analyses 


Five  half  matrices  consisting  of  average  similarity  indices  for  the 
ROS,  SUPS,  0BS1,  0BS2,  and  STUDS  were  calculated  by  computing  arithmetic 
means  of  retjpeotive  stimulus  by  stimulus  values  across  all  individuals  in 
each  group.  An  additional  "total  average"  (AVE)  matrix  was  created  by 
calculating  analogous  indices  over  all  participants.  Data  in  these  six 
matrices  served  as  the  basic  input  for  the  MDS  analyses. 


As  nonmetric  MDS  programmes  iterate  toward  a  goal  of  stress 
minimization,  they  may  get  caught  up  in  less  than  optimal  solutions  by 
locating  local  function  minimums.  This  is  more  likely  to  occur  the  more 
dissimilar  the  initial  configuration  is  from  the  "optimal"  configuration. 
Spence  (1972),  in  a  rather  extensive  empirical  comparison  of  a  number  of 
MDS  strategies,  indicated  that  a  procedure  developed  by  Young  and 
Torgerson  (1967)  may  effectively  circumvent  local  minimum  problems.  This 
algorithm,  which  Involves  using  conventional  metric  MDS  on  input  data  to 
produce  an  initial  configuration  for  the  nonlinear  MDS  was  modified 
slightly  and  used  to  start  MDSCAL  analyses  of  the  AVE  proximity  matrix. 
As  shown  in  steps  1  to  3  of  the  schematic  analysis  representation  of  Table 
1,  this  modification  involved  creating  a  randomly  augmented  "initial" 
matrix  for  MDSCAL  of  the  AVE  data,  and  was  required  since  KYST  can  handle 
proximity  matrices  reflecting  but  60  elements  vice  the  169  involved  in 
this  project.  Figure  1  represents  a  plot  of  the  stress  values  of  AVE 
MDSCAL  analyses  in  configuration  dimensionalities  from  ten  down  to  two. 


Insert  Table  1  and  Figure  1  about  here 


The  resulting  169  x  10  AVE  configuration  was  used  as  the  initial 
configuration  for  MDSCAL  analyses  of  each  of  the  ROS,  SUPS,  0BS1,  0BS2, 
and  STUDS  proximity  data.  Numbers  of  iterations  and  stress  values  for 
these  analyses  are  listed  at  the  bottom  of  Table  1. 


Representation  of  the  task  dimension  scale  values  would  require  more  space 
than  is  justified  here,  but  may  be  obtained  from  Rampton  (Note  8).  Visual 
inspection  of  these  values  showed  considerable  similarity  across 
configurations. 
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Copper tr^;  £!DS  SVrufftures  Us^nK  R?F^rgg?i<?n  Analysis 


Multiple  linear  regression  (REGRESS  -  see  Miller*  Shephard  &  Chang, 
1972|  for  a  discussion  of  the  specific  technique  used)  provided  the  means 
for  a  more  rigorous  comparison  across  configurations.  Results  of  these 
analyses  are  presented  in  Tables  2  to  5.    .In  the 


Insert  Tables  2  to  5  about  here 


terminology  of  the  traditional  test  -  criterion  validation  paradigm:  (a) 
the  ten  dimensions  of  the  SUPS,  OBSl,  0BS2,  and  STUDS  configurations 
served  as  "predictors"  in  separate  analyses,  (b)  each  of  the  10  ROS 
dimensions  served  in  turn  as  a  "criterion"  in  each  set  of  analyses  (SUPS 
vs.  ROS,  0BS1  vs.  ROS,  0BS2  vs.  ROS,  and  STUDS  vs.  ROS),  and  (c)  the 
169  task  statements  served  as  "subjects"  in  each  run.  One  can  conceive  of 
these  analyses  as  equivalent  to  locating  directions  or  vectors  in  the 
SUPS,  OBSl,  0BS2,  and  STUDS  configurations  correlating  most  highly  with 
each  of  the  10  ROS  dimensions.  Thus  the  multiple  correlations  (Rs)  of 
Tables  2  to  5  reflect  the  strength  of  relationship  between  these  best 
fitting,  artificial  ROS  vectors,  and  the  actual  ROS  dimensions.  The  Rs 
are  seen  to  be  generally  quite  large.  (Computing  confidence  intervals  in 
the  manner  suggested  by  Garrett  (1966,  p.  ^16)  indicates  that  a  critical 
value  of  Rt.20  is  required  to  be  stati:;tically  significant  at  p<.01  for 
each  of  the  multiple  Rs  reported  in  this  section). 


The  ROS  were  chosen  as  the  primary  reference  group  in  these 
analyses  because:  (a)  they  generally  had  more  experience  with  the  tasks 
than  did  individuals  in  the  other  groups,  and  (b)  they  were  all  senior 
supervisors,  trainers,         or         responsible         f^^  maintaining 

proficiency/performance  standards.  For  present  purposes,  this  latter 
point  is  particularly  relevant.  The  three  functions  subsumed  within  it 
imply  that  these  individuals  should  tend  to  represent  what  might  be  called 
the  "official  point  of  view"  about  technical  aspects  of  task  performance. 

The  matrix  of  direction  cosines  in  each  of  the  tables  shows  that 
these  fitted  ROS  vectors  and  the  initial  configuration  dimensions  for  each 
group  corresponded  in  a  one  to  one  fashion.  While  the  SUPs,  0BS1,  and 
0BS2  configurations  differed  little  in  the  degree  to  which  they  related  to 
the  ROS  dimensions,  the  STUDS  data  did  not  show  as  much  correspondence. 


Fleishman  (1967b)  has  shown  that  as  people  become    more  proficient 
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at  complex  tasks,  different  kinds  of  abilities  contribute  to  performance. 
This  might  explain  why  the  STUDS'  configuration  did  not  show  as  much 
correspondence  to  that  of  the  ROS  as  did  those  of  the  more  experienced 
groups.  Fleishman's  observation  would  also  lead  one  to  expect  that  those 
groups  most  alike  in  experience  and  proficiency  would  perceive  underlying 
dimensions  of  their  jobs  more  similarly  than  groups  less  alike  on  these 
variables.  To  test  whether  this  might  be  borne  out  in  the  present  data, 
Rs  were  calculated  between  dimensions  from  the  0BS1  configuration  and  each 
of  the  0BS2  dimensions.    Table  6  summarizes  the  results  of  these  analyses. 


Insert  Table  5  about  here 


These  two  configurations  can  be  seen  to  correspond  very  highly,  both  in 
content  and  orientation. 


Cosines  of  .00  existed  between  the  original  MDSCAL  ROS  dimensions. 
In  Tables  2  to  5  however,  one  observes  that  many  of  the  cosines 
(correlations)  between  the  ROS  vectors,  when  fitted  into  the 
configurations  of  remaining  groups  are  considerably  larger  than  .00  in 
absolute  value.  (A  cosine  or  correlation  of  .00  denotes  an  angle  of  90  ). 
For  example,  eight,  nine,  six,  and  twelve  ROS  intervector  cosines  in  the 
SUPS,  0BS1,  0BS2,  and  STUDS  configurations,  respectively,  exceeded  .30  in 
absolute  value.  Seven  of  the  cosines  in  the  STUDS  configuration  had  more 
than  three  values  this  large.  Further,  the  two  largest  values  in  the 
STUDS  configuration  (.62  and  -  .75)  far  exceeded  the  next  largest  values 
in  any  of  the  other  configurations.  This  evidence  strongly  suggests  that 
the  groups  responded  to  the  task  functions  in  systematically  different 
ways.  It  also  indicates  that  the  STUDS  differed  more  from  the  ROS  in  this 
regard  than  did  the  other  groups. 


Contrary  to  expectation,  however,  the  SUPS  did  not  appear  to  be 
significantly  more  like  the  ROS  than  did  the  0BS1  and  0BS2.  (A  probable 
reason  for  this  is  given  in  Rampton  (Note  8),  and  relates  to  historical 
training  and  experience  commonalities  shared  by  the  0BS1,  0BS2  and  ROS 
that  were  not  as  similar  for  many  of  the  SUPS).  Smaller  intervector 
correlatidns  did  result  when  the  0BS2  content  dimensions  were  inserted 
into  the  0BS1  configuration  (as  shown  in  Table  6,  only  five  of  the  0BS2 
fitted  vectors  in  the  0BS1  configuration  exceeded  .30).  This  indicates 
that  the  0BS1  and  0BS2  formed  a  relatively  homogeneous  dyad  wh^n 
considered  in  the  context  of  the  four  groups  of  "skilled"  participants. 
In  total,  this  evidence  is  taken  as  supporting  a  contention  that  the  more 
similar     two     groups     are      in      skill ,    experience    level ,    and  other 
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characteristics!  the  more  similarly  they  will  perceive  salient  aspects  of 
their  work. 


delation  Between  Configurations  and  Rated  Properties 


Average  unldimensional  ratings  for  each'  of  the  169  task  statements 
of  the  seven  variables  defined  earlier  (see  Hampton,  Note  8,  for  the 
instructions  and  format  under  which  these  scales  were  administered),  were 
calculated  by  taking  arithmetic  means  of  respective  task  ratings  over  all 
individuals.  Numbers  of  respondents  per  scale  were:  Concentration  (37), 
Difficulty  (38),  Manual  Skill  (ill).  Importance  (37),  Cooperation  (39), 
Speed  (39),  and  Mental  Effort  (36).  These  numbers  were  not  all  equal 
because:  (a)  the  scales.  Importance,  Concentration,  and  Mental  Effort, 
for  one  of  the  0BS2  individuals  were  not  completed  properly  and  had  to  be 
discarded,  and  (b)  the  fourth  Youden  square  was  incomplete  (containing  but 
six  participants),  so  that  the  answer  sheet  balance  inherent  in  each 
complete  block  was  unfulfilled  in  the  last  one. 


REGRESS  was  used  to  locate  vector  orientations  in  the  AVE,  ROS, 
SUPS,  0BS1,  0BS2,  and  STUDS  configurations  showing  maximum  correspondence 
to  each  of  the  average  rated  properties.  The  results  of  these  analyses 
are  shown  in  Tables  7  to  12.  The  format  of  these  tables  parallels  that  of 
Tables  2  to  6»  Multiple  correlations  or  Rs  between  the  10  dimensions  for 
each  group  and  each  of  the  seven  rated  properties  are  shown  as  the  first 
line  of  numbers  in  each  table.  Each  table  also  provides  a  matrix  of 
direction  cosines  showing  how  the  fitted  vectors  were  oriented  in  the 
respective  configural  spaces,  as  well  as  a  matrix  of  cosines  showing  the 
Interrelationships  of  the  vectors  in  the  space. 


Tables  13  to  22  represent  an  attempt  to  interpret  each  of  the  ten 
AVE  content  dimensions  produced  in  the  analyses.  Each  table  contains  a 
statement  summarizing  the  definitions  derived  for  each  dimension.  Each 
also  contains  a  listing  of  twenty  of  the  more  salient  tasks  (ten  on  each 
end  of  the  dimension)  to  serve  as  typical  representatives  of  these 
constructs.  Interpreting  these  dimensions  turned  out  to  be  a  complex 
process.  While  the  data  in  Tables  7  to  12  were  the  primary  sources  of 
information  used,  simultaneous  consideration  of  this  information  with 
virtually  all  that  contained  in  Tables  2  to  6,  and  the  loadings  of  the 
tasks  on  the  dimensions  were  necessary^  It  was  quickly  discovered  that  an 
intimate  working  knowledge  of  each  of  the  tasks  was  also  essential,  and 
the  author  was  fortunate  in  being  able  to  rely  on  colleagues  at  the 
Canadian  Forces  Personnel  Applied  Research  Unit,  (having  considerable 
experience  with  the  content  domain  under  study)  and  experienced 
Navigator/Radio  Officers  working  at  military  establishments  in    the  local 
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area  to  assist  him  in  this  regard. 


While  the  Rs  in  Tables  7  to  12  are  all  significant  at  P<.01.  they 
are  generally  small  to  moderate  in  absolute  value.  (This  is  perhaps 
understandable  given  the  inherent  limitations  of  this  kind  of  criterion 
measure).  Further,  one  notices  similar  multiple  correlation  and  direction 
cosine  profiles  across  tables.  Difficulty  and  Manual  Skill  tend  to  have 
lowest  saliences  in  each  configuration;  Importance,  Cooperation,  and 
Mental  Effort  generally  have  moderate  salience;  while  Speed  and 
Cooperation  typically  show  largest  relationship. 


The  matrices  of  direction  cosines  showing  correspondence  between 
the  fitted  vectors  and  the  configural  dimensions,  as  well  as  the  matrices 
of  cosines  of  angles  between  property  vectors  in  the  configurations  were 
useful  for  interpretive  purposes.  They  depict  the  relationships  among  the 
properties  and  dimensions  as  well  as  the  interrelationships  between  the 
properties  when  located  in  the  configuration. 


In  examining  the  evidence  in  Tables  7  -  12,  it  is  important  to 
remember  that  though  the  Youden  square  arrangement  was  set  up  to  balance 
presentation  of  the  scales,  this  balance  was  not  complete  since  the  last 
experimental  block  was  only  partially  filled.  One  should  also  be  aware 
that  the  properties  are  somewhat  confounded.  This  granted,  it  is  apparent 
from  the  matrices  of  cosines  of  angles  between  the  property  vectors  in  the 
configurations,  that  the  property  ratings  reflected  more  than  subject 
variance  confounding,  or  halo.  Further,  the  relationship  profiles  are 
reasonably  consistent  across  all  configurations,  and  make  a  great  deal  of 
Intuitive  sense.  For  example,  the  properties  Concentration,  Difficulty, 
Importance,  Speed,  and  Mental  Effort  show  moderate  to  large 
Interrelationships.  The  only  variable  which  shows  consistent  relationship 
to  Cooperation  is  Manual  Skill,  reflecting  the  fact  that  many  of  the 
heavy,  physical  tasks  done  on  the  Argus  aircraft  by  an  Air  Observer  are 
typically  done  in  cooperation  with  someone  else. 


ImplicatiQns  and  Possible  Research  Extensions 


The  results  of  Tables  2  to  12  make  sense  when  considered  solely  on 
the  basis  of  the  structural  representations,  aS  well  as  when  considered  in 
the  context  of  external  criteria.  This  augurs  well  for  the  validity  of 
the  foriti  and  content  of  the  dimensions  produced,  and  thus  the  methodology 
used  to  produce  them.  Howevrr,  some  question  might  still  remain  as  to  the 
relevance  of  these  diita  i:.  the  context  of  the  "criterion  problem".  One 
night,  for  example,  ouestion  whether  the  task  functions    as    derived  were 


the  moat  useful  entitles  on  which  to  base  the  dimensional  analyses,  and 
whether  actual  or  simulated  Job  problems  (such  as  the  circuit  types 
typically  repaired  by  naval  aviation  technicians  used  by  Schultz  &  Siegel, 
Note  10#  or  the  simulated  air  traffic  control  situations  used  by  Landis, 
Silver,  Jones  4  Messick,  I967)  might  not  be  more  appropriate^  These  and  a 
number  of  related  issues  are  discussed  in  succeeding  paragraph^. 


AmiSSJdiilsms.  QL  lasK.  fsmsiiflna      Basic  Analvtlo  Units, 


Xbs  |£iJ4i  iiC  Oegnatajl  J^j.  lask  Charaoteristioa,  The  basis  on 
which  ^imila^ity  of  dissimilarity  judg.ements  are  made  in  a  HDS  study  roust 
have  an  important  bearing  on  results.  Both  the  Schultz  and  Siegel  (1962) 
and  the  LancJis  et  al.  (1967)  studies  required  participants  to  make 
proximity  judgements  on  the  basis  of  the  similarity  of  the  aJJaiuU.  JifiE.  ^1 
while  the  Air  Observers  and  Radio  Officers  were  asked  to  make  their 
decisions  on  the  basis  of  i^QiL  aitnllar  ^  tasks  Mer3  Jitt  dfi.  This 
difference  in  emphasis  is  believed  important.  In  a  sense  1  the  distinction 
relates  to  th^  difference  between  the  analysis  of  task  performance  in 
terms  of  QB^n^iXsZ  oharqcte^X^^jq^  versus  tasK  charaoterjlstiGS.  discussed  by 
various  authors  (e.g.,  Prien  &  Ronan,  1971;  Wheaton,  Note  11).  It  seems 
obvious  that  dimensions  produced  from  MDS  of  proximities  based  solely  on 
judgements  of  similarity  (or  dissimilarity)  of  job  problems  or  the  like, 
must  have  a  primarily  "task  characteristics"  perspective,  and  inferences 
about  ability/ skill  components  will  likely  be  possible  only  indirectly, 
through  consideration  of  the  configurations  in  the  context  of  personal 
correlates  (^^  was  done  by  Landis  et  al.). 


In  the  Present  investigation,  an  attempt  was  made  to  niiiintain  an 
"operator**  perspective.  This  was  the  reason  for  orienting  instructions  so 
that  particlPiints  would  respond  on  the  basis  of  how  similar  the  tasks  were 
to  do  rather  than  on  the  basis  of  other  attributes.  Although  analyses 
based  on  judgements  of  task  sijnilarity  per  se  may  be  of  interest  in  other 
applicatlonSi  the  outcome  would  not  likely  reflect  the  kinds  oc  criterion 
(e.g.,  ability/ skill)  dimensions  under  investigation  here#  The^e  comments 
should  fiot^  however,  be  construed  as  criticism  of  other  approaches.  For 
example,  the  Purpose  of  the  Landis  et  al.  investigation  differed  from  the 
one  reported  here.  Thus,  even  if  their  approach  were  applied  to  the 
present  situation,  dissimilar  (though  hopefully  complementary)  results 
would  be  expected.  A  potentially  useful  extension  to  the  resen^ch  in  this 
Investigation,  in  fact,  might  be  to  prepare  a  number  of  tactical  or  other 
situations  typically  faced  by  Air  Observers  and  to  use  these  in  place  of 
the  task  functions  in  proximity  generation  and  MDS  procedures  analogous  to 
those  outlined  earlier.  A  comparison  with  the  dimensions  Produced  to 
those  in  this  Project  could  then  be  made.  One  would  not  expect  the  two 
sets  of  configurations  to  overlap  completely,  but  one  would  hope  that  they 
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would  be  meaningfully  relatable  to  each  other. 


Number  anc[  Specificity  of  Task  Entities.  In  reviewing  the 
literature  for  studies  that  have  used  MDS  or  related  analytic  approaches 
in  task  analysis,  one  is  struck  by  the  relatively  small  nauber  of  entities 
(tasks  functions,  simulated  aerodrome  situations,  «tc, )  typically 
involved.  For  example.  Brown  (1967)  used  a  sample  of  18  task  statements. 
Siegel  and  Schultz  (Note  12)  also  used  18  task  statements,  Smith  and 
Siegel  (1967)  used  3^  task  functions,  and  Landis  et  al.  (1967)  used  30 
simulated  air  traffic  control  problems.  In  many  of  these  applications, 
the  investigators  may  have  been  limited  either  by  the  naubers  of  objects 
their  computer  programmes  could  handle,  or  by  the  amount  of  labour  their 
method  of  generating  proximities  required  of  participants. 


A  number  of  procedures  have  been  designed  to  economize  on 
participant  labour.  Other  procedures  have  been  created  which  use  separate 
computer  runs  to  build  up  MDS  solutions  containing  more  objects  than  can 
be  handled  in  a  single  run.  Kru3kal  et  al.  (Note  7)  provides  a  brief 
introduction  to  some  of  these  techniques.  Alternatively,  if  one  has 
access  to  sufficient  computing  resources,  it  is  sometime^  possible  to 
enlarge the  computer  programmes  to  handle  as  many  stimuli  as  are 
needed.  Although  these  procedures  have  been  available  for  some  time,  most 
investigators  have  either  reduced  the  number  of  tasks  by  picking  a  small 
sample  of  all  those  possible,  or  defined  the  tasks  at  such  a  gross  level 
of  generality  that  a  small  number  provided  a  global  description  of  the 
job. 


There  are  a  number  of  potential  difficulties  in  having  relatively 
few  task  statements  in  a  MDS  analysis:  with  a  small  numbor  of  objects 
(and  thus  interrelationships)  one  cannot  possibly  obtain  many  dimensions, 
even  if  a  larger  number  exists  in  the  content  domain.  Fm^ther,  the 
smaller  the  ratio,  number  of  objects/number  of  dimensions,  the  less 
reliable  or  tightly  constrained  will  be  the  final  configuration. 


In  using  MDS  in  task  analysis,  it  is  important  to  recognize  that 
the  level  of  generality  with  which  the  task  statements  are  derived,  though 
somewhat  arbitrary,  will  have  an  important  bearing  on  the  results.  For 
example,  except  for  some  work  to  make  level  of  abstraction  a  bit  more 
equitable,  and  to  recast  some  in    language    that    would    be    more  readily 


1.  Enlarging  the  MDSCAL  and  REGRESS  programmes  for  this  research  project 
proved  somewhat  complicated.  Copies  cf  these  computer  programmes  may 
be  obtained  from  the  author. 
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understood  by  participants,  the  total  set  of  350  task  elements  might  have 
served  as  the  basis  for  similarity  judgements  and  subsequent  MDS  analyses. 

Vernon  (1965)  has  proposed  that  one  can  view  skilled  behaviour  as 
being  structured  hierarchically.  At  the  apex  are  broad  factors,  each 
accounting  for  performance  in  a  wide  range  of  tasks.  Below  these,  and 
serving  as  building  blocks  for  them,  are  successive  hierarchical  layers  of 
increasingly  specific  abilities,  which,  though  pertaining  to  the  same 
variance  as  the  layers  above,  also  account  for  some  of  the  variance  in 
more  disparate  tasks.  From  this  perspective,  the  more  specific  and 
detailed  one  can  be  in  generating  task  statements,  the  better.  Thus,  a 
comprehensive,  specific  list  composed  of  many  items  should,  other  things 
being  equal,  account  for  more  content  variance  than  a  general  list 
composed  of  few.  Within  this  conceptual  framework,  it  should  always  be 
possible  to  produce  a  more  general  MDS  configuration  from  a  specific  one 
by  suitable  rotation  and/or  clustering  procedures,  but  not  the 
converse—that  is,  one  could  not  move  from  general  solution  to  specific 
solution^ 


However,  one  is  limited  in  the  extent  to  which  one  can  handle  a 
long,  detailed  task  statement  list  by  the  purely  practical  considerations 
of  work  capacity  of  participants,  and  computer  resources.  Even  with  a 
major  effort  to  economize,  both  of  these  resources  were  stretched  about  as 
far  as  they  could  in  this  study.  Thus  the  169  task  statements  reflect  a 
compromise  between  the  desirability  of  having  more  and  more  detail  vs. 
practical  resourse  constraints. 


Generalizing  Acroas  york  Situatiops 


The  task  analysis  methodology  used  to  investigate  the  Air  Observer 
trade  was  designed  to  be  as  general  as  possible  so  that  the  same  format 
could  be  used  in  many  work  situations.  This  was  one  of  the  major  reasons 
for  deciding  on  the  two  step  task  descriptive  phase  of:  (a)  breaking  the 
Job  down  into  task  elements  and  categorizing  these  according  to  an 
established  task  taxonomy,  and  (b)  summarizing  and  rewording  the  content 
from  the  previous  step  into  task  functions  designed  to  be  at  about  the 
same  level  of  generality  and  in  language  that  could  be  understood  by 
participants.  With  this  process  as  a  means  of  defining  the  content  to 
study  in  each  application!  it  should  be  possible  to  apply  the  methodology 
across  trades  with  only  minor  adjustments. 


In  doing  across-trade  comparisons,  one  might  start  with  a  number  of 
trades,    each    sharing  content  with  at  least  some  of  the  others.    That  is. 
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one  might  have  trade  A  sharing  some  components  with  trade  B,  trade  B  with 
trade  C,  A  with  D,  and  perhaps  (but  not  necessarily)  A  with  C.  Note  that 
each  trade  would  not  have  to  overlap  with  every  other  trade  in  the  set, 
and  that  there  need  be  no  limit  to  the  number  of  trades  involved  (except 
regarding  practical  constraints).  One  can  imagine  the  situation  as  being 
represented  by  a  Venn  diagram  of  overlapping  circles,  each  circle 
representing  a  trade.  With  the  adding  of  more  trades,  it  is  possible  that 
any  two  (say  A  and  Z),  though  linked  together  by  a  pathway  of  other 
overlapping  trades,  might  share  no  common  variance. 


Taking  any  two  trades,  for  example  A  and  B,  one  could  process  each 
through  the  task  descriptive  phase  of  the  task  analysis  methodology, 
ensuring  that  the  tar>k  f'unctions  derived  for  each  were  at  about  the  same 
level  of  generality.  IdenticaJly  worded  task  f^-mctions  corresponding  to 
the  content  shared  by  the  two  trades  would  be  generated  and  included  in 
the  total  set  of  task  functions  for  each  trade. 


Suppose  that  100  and  109  task  functions  were  created  for    trades  A 
and    B    respectively,  and  of  these,  ^1  were  common.    One  would  run  through 
the  proximity  generation,  MDS,  and  other  methodological    phases    for  each 
trade.      Then,    the    i^l    common    task  functions  could  be  used  as  a  nucleus 
around  which  to  build  a  combined  A  +  B  configuration  using  the  FIX  option 
of    an    enlarged    KYST    computer    programme  (Kruskal  et  al..  Note  7).  One 
could  repeat  the  procedure  by  adding  trade  C  to  obtain  a  combined  A  +  B  + 
C    configuration,    and/or    separate  A  +  C,  B  +  C  configurations.    The  FIX, 
KYST  option,  in  conjunction  with  suitable  algebraic  manipulations,  should 
allow  one  to  infer  interrelationships  of  tasks  not  shared  by  two  jobs  from 
knowledge  of  interrelationships  of  those  that  were.      One    would    want  to 
build    in    a    number    of    cross    checks    to    ensure    that  the  results  were 
consistent  (i.e.,  in  the  configuration  combination  A  +  B    +    C    one  might 
start  the  process  from  different  points  —  e.g.,  C  and  B  rather  than  A  and 
B,  to  ensure  the  end  result  was  the  same). 


In  effect,  the  proci^dures  as  outlined  should  allow  one  to  predict 
analytically,  how  tasks  frarn  different  work  situations  might  relate  to 
each  other  if  they  were  togetner,  and  could  thus  represent  a  powerful  tool 
for  the  structuring  and  restructuring  of  jobs.  Another  important  use  of 
these  procedures,  of  course,  would  be  as  a  means  of  integrating  task 
analysis  data  from  complex  work  environments. 


The  MPS  Dimensions  as  Criteri^a 


Before  the  true  significance  of  the  methodology  illustrated  in  this 


paper  can  be  evaluated,  it  is  necessary  to  establish  how  well,  or  indeed 
whether,  the  •'pOi.ential*^  of  the  approach  is  translatable  into  reality.  In 
the  context  of  aptitude  test  development  for  example,  one  might  ask 
whether  any  of  the  10  performance  dimensions  were  suggestive  of  kinds  of 
tests  that  might  predict  training  or  job  success  in  an  applicant  to  the 
Air  Observer  trade.  The  following  paragraphs  outline  research  bearing  on 
this  point.  It  was  conducted  by  a  colleague  of  the  author's  at  the 
Canadian  Forces  Personnel  Applied  Research  Unit  and  is  more  fully 
documented  elsewhere  (Fournier,  Note  18). 


IhSi  Criterion  Dimensions  ag.  ±  Basils  for  Developing  Aptitude  Tests 


Early  in  the  task  analysis  program  while  observing  the  Air  Observer 
at  work,  it  was  noted  that  much  of  the  job  entails  processing  information 
from  two  or  more  sources  at  once,  particularly  in  visual  and  auditory 
modes.  This  observation  was  substantiated  in  later  stages  of  the  analyses 
by  the  appearance  of  Dimensions  I  and  II  as  defined  in  Tables  13  and  14. 
As  Fournier  (Note  18)  states: 


For  example,  all  crew  members  must  monitor  the 
Intercommunication  system  while  performing  visual 
detection  functions.  Many  of  the  work  stations  require 
the  operator  to  manipulate  equipment,  monitor  for 
targets,  monitor  for  equipment  malfunctions,  and  report 
status  of  detections  while  maintaining  currency  with  the 
tactical  scenario  and  crew  communications  (p.  1-2) 


The  fact  that  individuals  respond  differently  when  two  or  more 
physical  or  perceptual  demands  are  made  simultaneously  than  when  either 
are  presented  singly  has  been  noted  for  some  tioie.  For  example.  Chiles, 
Alluisi  and  Adams  (1968),  and  Chiles  and  Jennings  (Note  1^)  have  suggested 
th^t  individuals  differ  in  their  ability  to  "time-share"  or  "shift  gears" 
from  the  requirements  of  one  aspect  to  another.  These  authors  have  even 
implied  that,  in  eliciting  these  differences,  the  nature  of  the  task  is 
not  as  important  as  the  level  cf  time  sharing  on  the  part  of  the  operator. 


Thus  far,  the  effects  on  performance  of  having  to  "time-share"  has 
been  studied  primarily  in  "dual-task"  contexts  in  which  information 
processing  or  action  is  required  on  stimuli  presented  simultaneously  from 
both  a  primary  and  a  secondary  source.  Performance  measures  taken  under 
these  conditions  are  compared  those  taken  when  the  stimuli  from  each 
source  are  presented  separately.  A  drop  in  performance  from  the 
single-task  situation  to  the  dual-task  situation  is  generally    noted  (se*^ 
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Johnston,  Greenberg,  Fisher  &  Martin,  1970;  posner  &  Boies,  1971: 
Shulman  4  Greenberg,  1971;  Smith,  1969;  Taylor,  Lindsay  &  Forbes,  I967). 
It  has  been  suggested  that  the  drop  in  performance  from  single-task 
presentation  to  dual-task  presentation  is  inversely  proportional  to  the 
"spare  processing  capacity"  of  the  operator  when  he  handles  the  primary 
task  situation  alone  (Brown,  1962). 


It  is  obvious  from  a  number  of  the  tasks  loading  at  the  low  end  of 
Dimension  I  that  performance  required  simply  to  do  a  tasf,  is  not 
necessarily  what  makes  it  complex  for  the  Air  Observer.  The  mixieu,  or 
what  may  be  going  on  when  the  task  is  performed  is  also  significant.  For 
example,  tasks  and  U5  may  not  be  complex  to  perform  in  and  of 
themselves,  but  when  they  must  be  done  under  operational  conditions,  the 
situation  can  be  complex.  In  this  context,  the  individual  must 
simultaneously  process  information  from  a  variety  of  sources,  as  well  as 
perform  a  number  of  other  functions.  Dimension  II  reflects  even  more 
directly  a  general  requirement  on  the  part  of  the  Observer  to  handle  dual 
or  multi-so-JTce  tasks.  Tasks  loadings  at  the  high  end  of  this  dimension 
tend  to  be  those  in  which  an  individual  must  accauulate ,  process  and 
synthesize  information  (often  received  simultaneously)  from  several 
sources  before  making  a  decision. 


On  the  basis  of  the  evidence  that  the  ability  to  simultaneously 
process  information  from  more  than  one  source  was  important  to  an  Air 
Observer,  a  dual-task  situation  was  created  by  presenting:  (a)  a  primary 
task  consisting  of  a  na-nber  of  slides  each  showing  five  pictures  of 
aircraft  in  different  orientations  and  attitudes  along  with  readings  on 
two  aircraft  instr'x^ients  (artificial  horizon  and  compass),  and  (b)  3 
secondary  task  consisting  of  an  auditorily  presented  series  of  random 
digits  with  a  presentation  rate  of  two  seconds. 


Forty-nine  Observers  (some  of  whom  had  participated  in  the  task  analysis 
study)  were  asked  to  select  the  aircraft  picture  corresponding  to 
information  presented  on  the  instruments  while  repeating  aloud,  in 
sequence,  one  random  digit  after  being  given  the  next. 


The  psychometric  qualities  of  the  dual-task  measures  and  their 
relation  to  on- job  criteria  in  "concurrent  validity  comparisons"  are 
presented  in  detail  elsewhere  (Fournier,  Note  If).  The  following 
quotation  from  this  source  is  provided  as  a  succinct  statement  of  some  of 
these  findings: 


Measures    of    the     drop     in     performance  (dual-task 
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decrement)  observed  when  the  two  tasks  were  coablned 
coapared  to  performance  levels  when  done  separately, 
showed  that  some  Observers  were  able  to  perform  in  the 
dudl«>task  situation  better  than  others.  The  dual-task 
measures  were  not  significantly  related  to  operational 
experience. . .the  Radar  Simulator  criterion. . .was 
significantly  related  to  other  criterion  measures  but 
did  not  appear  to  reflect  a  large  dual-task  component. 


I>aal-*task  test  measures  were  systematically 
(significantly)  and  positively  related  to  job-related 
performance  measures  including  supervisor  rank  ordering, 
peer  ratings,  final  radar  training  grades,  and  three 
indices  produced  by  combining  the  subjective  and 
objective  criteria  measures  (p.  11). 


In  adddition,  though  requiring  confirmation  by  cross-validation, 
the  evidence  suggested  tnat  the  dual-task  measure  offered  a  significant 
prediction  Increment  when  combined  with  selection  procedures  already  in 
use. 


The  above  example  shows  but  one  way  the  information  from  Tables  13 
to  22  could  be  used  as  a  basis  for  creating  aptitude  measures.  Another 
strategy  woultf,  be  to  examine  tasks  loading  at  either  pole  of  specific 
content  dimensions  with  a  view  to  developing  task  replicas  as  measures  of 
aptitude.  In  creating  these  instruments  one  would,  of  course,  try  to 
limit  those  tasks  aspects  that  are  dependent  on  specific  previous 
learning.  The  practice  of  using  work  samples  as  predictors  of  later 
success  in  training  and/or  work  situai^ions  is  an  established  practice  (see 
Cronbach,  I960).  In  fact,  the  only  departure  from  tradition  proposed 
here,  i3  that  the  work  sample  would  be  selected  on  the  basis  of  prior 
evidence  that  it  contained  a  large  component  of  a  previously  identified 
construct.  More  conventionally,  Job  samples  are  generally  introduced  on  a 
cut  and  try,  intuitive  basis.  Then,  if  the  resixlting  instrument  predicts 
adequately,  it  survives.  Often  however,  it  is  difficult  to  determine 
exactly  what  is  beihg  measured  in  these  applications. 


One  notes  from  the  interpretations  given  in  Tables  13  to  22  that 
only  Dimensions  I,  II,  III,  VII  and  perhaps  VI  seem  to  reflect 
aptitude-like  qualities.  Remaining  dimensions  appear  to  have  more  to  do 
with  task  interrelationships  and  the  milieu  in  which  they  are  performed. 
Thus  there  i?  some  evidence  that  the  content  dimensions  are  of  different 
types.  Participants  in  the  study  were  asked  to  sort  on  the  basis  of  how 
slnllar  the  tasks  were  to  do.    Therefore,  given  that  attr^tbutesT^ther  than 
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aptitude-related  ones  were  relevant  in  making  the  sorting  decisions,  one 
would  expect  these  to  be  reflected  iA  the  results. 


Criterion  dimensions  identifieci  and  interpreted  as  in  this  study 
can  be  used  to  help  decide  what  aptitude  measures  might  be  useful  in  a 
particular  application.  After  this  has  been  done,  and  the  instruments 
prepared,  one  could  use  information  arising  from  these  dimensions  to 
suggest  the  form  and  content  of  criterion  data  to  use  in  validating  the 
tests.  There  are  a  number  of  ways  that  the  information  from  Tables  13-22 
could  assist  in  this  process. 


Criterion  Data  Collection  Procedures 


Evidence  that  reliable  unidimensional  scales  can  be  generated  from 
t^e  dimensions  produced  in  MDS  studies  has  been  provided  by  Schultz  and 
"Siegel  (Note  Iff).  These  studies  were  conducts*  before  the  advent  of 
recent  MDS  technology.  As  a  consequence  they  v/ere  restricted  to  rather 
limited  sets  of  stimuli.  In  spite  of  these  limitations,  these  studies 
contain  implications  for  significant  extensions  to  the  research 
methodology  illustrated  in  the  Air  Observer  research  programme. 


In  one  study  for  example,  Schultz  and  Siegel  (Note  10)  used  MDS  to 
investigate  content  dimensions  underlying  successive  interval  judgements 
of  18  tasks  .  ::.Lated  with  the  trade  of  electronics  technician  in  the 
U.S.  Navy.  The  following  four  dimensions  were  produced: 
Electro-Comprehenbion,  Equipment  Operation  and  Inspection,  Electro-Repair, 
and  Electro-Safety.  Taking  dimension  definitions  derived  from  task 
loadings  on  each  of  the  dij^^ensions,  the  authors  asked  technicians  to:  (a) 
judge  each  task  on  the  basis  of  its  perceived  relationship  to  each  of  the 
four  dimensions;  and  (b)  think  of  and  evaluate  other  technicians  on  the 
task  as  viewed  from  the  dimension  definitions.  From  these  judgements 
unidimensional  scales  were  produced  which  met  Thurstone  and  Guttman 
scaling  requirements.  For  example,  the  indices  of  consistency  I,  which 
Green  (1955)  states  should  be  .50  or  higher  before  a  set  of  items  can  be 
considered  to  scale  in  the  Guttman  sense  were:  Electro-Comprehension 
(.62),  Equipment  Operation  and  Inspection  (.68),  Electro-Repair  (.74),  and 
Electro-Safety  77).  Correlations  between  the  direct  task  ratings  on 
each  of  the  defia  i  dimensions  and  the  task  loadings  on  each  dimension 
produced  in  tUe  MDS  analyses  were  Electro-Comprehension  (.88),  Operation 
and  Inspection  (.79)|  Electro-Repair  (.67),  and  Electro-Safety  (.50). 


Generalizing  from  the  Schultz  and  Siegel  research  program    to  that 


outlined  in  this  paper  has  limitations  because  of  the  small  number  (18)  of 
stimuli  used  in  the  former.  The  investigators  were  undoubtedly  restricted 
by  the  number  of  variables  their  computer  programmes  could  handle.  The 
task  analysis  literature,  however,  (some  of  which  was  summarized  earlier) 
suggests  that  this  list  of  18  tasks  was  either  incomplete  as  a  reflection 
of  a  skilled  trade  like  Electronics  Technician,  or  too  general  to  serve  as 
the  basis  of  a  task  analysis  in  any  realistic  sense. 


Following  the  lead  of  Schultz  and  Siegel  (Note  15),  one  might  use 
the  definitions  in  Tables  13  to  22  as  a  basis  for  generating" separate 
unidimensional  scales.  The  correlations  between  task  scale  values  on 
these  scales  and  the  task  projections  on  the  respective  MDS  dimensions 
would  serve  as  indices  of  the  adequacy  of  the  scale  development  process. 
Large  values  would  give  one  confidence  in  the  unidimensional  scales, 
attest  to  the  validity  of  the  MDS  results,  and  support  the  individual 
interpretations  ascribed.  If  some  of  the  correlations  were  too  small,  one 
might  try  adjusting  the  scale  definitions  and  then  redoing  the 
unidimensional  scaling.  Successive  iterations  with  this  strategy  should 
sharpen  the  dimensional  definitions.  If  the  definitions  of  certain 
dimensions  could  not  be  brought  into  focus,  this  would  serve  as  a  cue  that 
more  study  of  the  process  or  methodology  used  to  generate  them  was 
required. 


A  number  of  ccr.ventional  rating  and  ranking  procedures  (Torgerson, 
1958)  could  be  used  to  compare  individual  performance  against  the 
dimension  definitions  for  purposes  of  collecting  criteria  data  for  test 
validation,  and/or  performance  evaluation  for  promotions,  job  transfers, 
special  assignments,  and  so  on.  Alternatively,  one  could  investigate  the 
feasibility  of  using  task  loadings  on  the  content  dimensions  as  the  basis 
for  "behaviourally  based  rating  scales"  analogous  to  those  developed  by 
Campbell,  Dunnette,  Arvey,  and  Hellervik  (1973);  Fogli,  Hulin,  and  Blood 
(1971);  Landy  and  Guion  (1970),  Smith  and  Kendall  (I963);  and  Zedeck  and 
Baker  (1972).  Since  task  scale  values  on  each  of  the  derived  dimensions 
are  already  available,  it  would  be  possible  to  circunivent  many  of  the 
scale  development  phases  described  by  the  above  authoP3^ 


Suppose  one  had  MDS  contei dimensions  that  were  subsequently 
defined  and  redefined  using  Guttiaan  procedures  as  described  earlier  in 
this  section.  Then  to  form  a  behaviourally  based  scale  one  would  need 
only  to  select  a  number  of  tasks  that  were  reasonably  well  distributed 
along  the  dimension.  (Though  perhaps  not  absolutely  essenti<ii,  it  might 
be  wise  to  select  tasks  loading  only  on  the  dimension  of  interest).  Since 
the  MDS/Guttman  scaling,  procedures  should  hc>ve  provided  reasonably  clear 
scale  definitions,  interpretation  problems  should  be  minimal  if  raters 
could  be  Induced  to  concentrate  (i.e.,  in  comparing  ratee    performance  .to 
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that  required  to  do  the  task  at  a  certain  level  of  profioifinov)  only  on 
the  dimensional  description  of  interest. 


Concluslorj 

The  research  programme  outlined  in  this  paper  was  predicated    on  a 
contention  that  adequate  technology         for  delineating 

performance/behavioural  dimensions  inherent  in  job  tasks  exists,  and 
requires  only  to  be  organized  and  implemented  in  a  systematic  way.  One 
set  of  procedures  for  doing  this  was  presented  by  illustration  in  a  task 
analysis  of  the  Air  Observer  trade  in  the  Canadian  Forces.  The  results  of 
these  analyses:  were  reliable  and  internally  consistent  within  relatively 
homogeneous  groups  of  individuals;  were  readily  and  meaningfully 
generalizable  across  a  variety  of  work  situations  (experience  and 
responsibility  levels);  showed  promise  of  being  valid  in  terms  of 
producing  meaningful  results  showing  significant  relationship  to  external 
variables  and  being  readily  integrated  into  larger  bodies  of  scientific 
knowledge;    and  had  implications  that  could  be  extended  in  other  studies. 

There  is  no  intent  to  imply  that  the  methodology  outlined 
represents  a  panacea  for  the  "criterion  problem"  in  its  many  facets.  The 
goals  and  requirements  of  task  analysis  should  change  from  application  to 
application,  necessitating  corresponding  adjustments  in  research 
methodology.  The  taxonomy  and  judgemental  strategy  (e.g.,  sorting  or 
other  procedures)  for  generating  proximity  indices,  as  well  as  the  MDS 
models  and  other  analytic  techniques  should  be  tailored  to  suit  the 
specific  application. 

In  Cronbach  and  Gleser's  (1965)  terms,  the  procedures  outlined  in 
this  paper  must  be  considered  somewhat  "narrow  band"  but  hopefully  of 
high  fidelity".  Where  data  from  "wider  band"  procedures  were  required 
(e.g.,  when  investigating  trade  or  occupational  structures  in  large 
organizations),  other  kinds  of  methodologies  are  likely  to  be  required. 
This  qualification  being  granted,  the  evidence  suggests  that,  taken 
together,  the  kinds  of  procedures  used  in  the  Air  Observer  task  analysis 
represent  a  comprehensive,  integrated  research  methodology  not  previously 
available.  This  methodology  may  not  be  universally  appropriate,  but  when 
sensibly  and  appropriately  applied,  can  produce  reliable,  internally 
consistent,  and  valid  results  of  both  theoretical  and  practical  import. 
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/  On 


.35 


STEPS 


COMMENTS 


i. 


5. 


60  TASKS 
SELECTED 
PROXIMITIES 
EXTRACTED 


KYST  RUN 
YOUNG/TORG- 
ERSON  OPTION 


PARTIAL 
RANDOM  (PR) 
60X10 

CONFIGURATION 


PR  60X10 
STARTS  MDSCAL 
60X10  HOS 
ANALYSIS 


60X10  ROS 
MDSCAL  AUG- 
MENTED TO 
PR  169X10 
CONFIGURATION 


PR  169X10 
STAR  rS  MDSCAL 
OF  ALL  AVE 
PROXIMITIES 


a.  taifks  seU>cteu  to  sample  a  rcatsona'jly  broad 
dcfflai  A. 

b.  corresponding  60(60-1 )/2  intertask  s)iri laritles 
extracted  from  total  169(x69-l)/2  RUS  matrix. 


as  a  prelude  to  Step  2,  a  number  (.4  or  5)  of 
KYST  runs  were  conducted  using  Young/ 
Torgerson,  and  random  initial  conficurations. 
Visual  comparisons  indicated  the  solutions 
were  quite  similar. 

final  KYST  solution  involved  60  tasks  scaled 
in    6  dimensions. 

after  9  preiterations  stress  two  -  .27. 
minimum  reached  after  7i  iieraiions,  stress 
two  -  .17. 

final  configuration  rotated  to  principal 
components. 


last  4  columns  (dimensions)  added  by  inserting 
random  numbers  from  a  rectangular  dlstributiun , 
0.0  5  X  S  l.O. 


stress  two  started  at  about  .30  and  ended  at 
.11  (stress  one  »  .03)  after  about  9U  iterations. 
During  this  step  the  MDSCAL  prutiram  was 
enlarged,  and  modified  to  write  distances  on 
disc.    The  KOS  proximity  subsarr.^le  was  used  us 
test  data  in  these  runs.   (The  solution  of  the 
imncdiately  precedini;  run  was  Ub<?U  an  the 
initial  configuration  in  each  cns»;.  )  The  first 
run  has  been  mislaid  so  that  an  Ji;|iro\im;itr 
starting;  stress  and  numbers  of  ituratinn 
values  Are  provided  Iron  memory.    The  soltition 
reached  minimum  before  the  modi  f  ic:ttions  uerc 
complete  so  that  thf  "about"  90  itrT.itions 
were  several  more  th.m  were  artuully  needed. 
(The  series  of  analyses  eonsih'"d  of  four 
runs  in  all.) 


the  matn  ugmcntcd  by  random  numbers 

(rectan^:  iribution  ranging  0,0  <  X  <  i.OJ 

in  appr(j,  i '^ws  to  fill  in  109  tajiks  not 

represented  in  GO  x  10  configuration. 


AVE  FINAL 
STARTS  RE- 
MAINING 
MDSCAL  RUNS 


8. 


minimum  AVK  configuration  reach. d  after  69 
iterations. 

stress  two  started  at  .99  (stre:.*.  onn  .30) 
and  ended  at  .12  (stress  one  »  .UI). 


GROUP 

ROS 

SIPS 

QDSl 

0BS2 

STL' US 

STRESS  TWO 

.52^.17 

.32^.  16 

.32^.  16 

.31^.15 

.tiiJ^.  1(5 

STRESS  ONE 

.  10*^.05 

.10*^.  05 

.10*^.05 

.  10*^.  Oii 

.13  .04 

ITERATION  NUMBER 

49 

34 

34 

46 

47 

REACHED  MINIMUM 

yes 

yes 

yrs 

yes 

yes 

the  tail  and  head  of  arrows  indicate  respec- 
tive start  and  finish       stress  values. 


•   .  •     •  ••  •  )S8 

Table  1.    Schematic  smeary  of  MDSCAL  rnalyses. 
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Table  2 


Multiple  Linear  Regression  of  Each  ROS  Dimension 
on  SUPS  Dimensions 


ROS  Dimensions 

I         11"       III      IV       M         VI       VII      VIII    XX  X 

Multiple  Correlations 

.93      .88      .88      .68      .87      .75      .77      .87      .75  .67 

Matrix  of  Direction  Cosines  Showing  Correspondence 
Between  Fitted  ROS  Vectors  and  SUPS  Dimensions 

SUPS 

Dimensions 


I 

.81 

-.06 

.11 

-.03 

.12 

.07 

.04 

.06 

-.02 

-.05 

II 

-.31 

.94 

.05 

.04 

.13 

-.12 

-.21 

-.23 

.02 

-.13 

III 

-.13 

-.18 

.90 

.10 

.26 

-.02 

-.14 

-.14 

.07 

.24 

IV 

-.36 

-.12 

.01 

.90 

.20 

-.01 

.02 

-.27 

.05 

-.04 

V 

-.07 

.04 

.18 

.15 

.85 

.14 

.02 

-.09 

.05 

.03 

VI 

.09 

-.13 

-.00 

.12 

.04 

.9S 

.11 

-.05 

-.06 

.03 

VII 

-.13 

-.16 

-.02 

-.01 

.05 

-.10 

.94 

-.12 

-.00 

.04 

VIII 

.25 

-.08 

-.35 

.12 

-.28 

-.18 

.02 

.90 

-.02 

.10 

IX 

-.08 

.11 

.07 

.28 

-.12 

-.20 

-.17 

-.09 

.98 

-.06 

X 

-.05 

-.01 

.01 

-.22 

.22 

-.06 

-.05 

.03 

-.14 

.95 

Matrix  of  Cosines  Showing  Relationships  Between 

ROS  Vectors  After  Being  Fitted  into  SUPS  Configuration 

ROS 

Dimensions 


I 

II 

-.30 

III 

-.16 

-.08 

IV 

-.35 

-.06 

.07 

V 

-.18 

.08 

.52 

.22 

VI 

.17 

-.22 

.06 

.06 

.19 

VII 

-.02 

-.42 

.08 

.01 

.09 

.07 

VIII 

.49 

-.23 

-.46 

.21 

-.44 

-.16 

-.06 

IX 

-.14 

.12 

.13 

.36 

-.07 

-.23 

-.3.7 

X 

-.04 

-.20 

.23 

-.22 

.25 

-.03 

.09 
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Table  3 

Multiple  Linear  Regression  of  Each  ROS  Dimension 
on  0B?1  Dimensions 


ROS  Dimensions 

I         II       III      IV       V         VI       VII     VIII    IX  X 

Multiple  Correlations 

.93      .87      ,88      .76      .86      .77      .81      .83      .66  .74 

Matrix  of  Direction  Cosines  Showing  Correspondence 
Between  Fitted  ROS  Vectors  and  OBSl  Dimensions 

OBSl 

Dimensions 


I 

.8? 

-.02 

.06 

-.08 

.02 

.11 

-.03 

.07 

-.02 

-.05 

II 

-.14 

.94 

-.14 

.05 

.11 

-.05 

-.13 

-.15 

-.03 

-.19 

III 

-.07 

-.14 

.91 

-.03 

.15 

-.04 

.06 

-.05 

.11 

.04 

IV 

-.29 

-.05 

-.11 

.90 

.19 

.04 

.08 

-.26 

-.08 

V 

-.18 

.Q4 

.19 

.14 

.92 

.  11 

.00 

-.17 

.04 

.15 

VI 

.02 

-.20 

-.01 

.22 

.14' 

.91 

.07 

-.12 

-.07 

-.02 

VII 

-.08 

-.13 

-.04 

.05 

.00 

-.06 

.98 

-.03 

.10 

.01 

VIII 

.29 

-.09 

-.27 

.10 

-.08 

•  .33 

.00 

.93 

-.06 

.01 

IX 

.02 

-.07 

.14 

.27 

.04 

-.19 

.03 

-00 

.90 

-.04 

X 

-.12 

-.16 

.02 

-.13 

.24 

-.04 

.07 

.02 

-.21 

.96 

Matrix  of  Cosines  Showing  Relationships  Between 

ROS  Vectors  After  Being  Fitted  into  OBSl  Configuration 

ROS 

Dimensions 
I 


II 

-.14 

III 

-.07 

-.22 

IV 

-.32 

-.04 

-.11 

V 

-.28 

.05 

.31 

.30 

VI 

.00 

-.17 

.05 

.16 

.24 

VII 

-.12 

-.28 

.02 

.13 

.05 

.00 

VIII 

7 

-.19 

-.27 

-.21 

-.32 

-.42 

-.04 

IX 

-.03 

-.07 

.25 

.33 

.05 

-.22 

.11 

X 

-.14 

-  32 

.10 

-.20 

.32 

-.04 

.10 
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Table  4 


Multiple  Linear  Ragression  of  Each  ROS  Dimension 
on  0BS2  Dimensions 


ROS  Dimensions 

I         II        III      IV       V        VI       VII      VIII    IX  X 

Multiple  Correl ations 

.95      .86      .89      .74      .86      .75      .80      .?5      .76  .67 

Matrix  of  Direction  Cosines  Showing  Correspondence 
Between  Fitted  ROS  Vectors  and  0BS2  Dimensions 

QhSl 

Dimensions 


I 

,90 

-.07 

-.03 

.01 

.10 

.01 

.10 

-.04 

.00 

II 

-.20 

.92 

-.00 

.06 

.12 

.03 

-.04 

-.05 

.16 

-.06 

III 

-.08 

-.12 

.99 

-.04 

.11 

.04 

.02 

-.00 

.06 

.24 

IV 

-.17 

-.04 

-.02 

.95 

.14 

.09 

.03 

-,09 

.07 

-.25 

V 

-  01 

.18 

.04 

.00 

.94 

.17 

-.08 

-.07 

.08 

-.12 

VI 

.15 

-.00 

.11 

-.07 

.09 

.91 

-.00 

-.16 

.02 

-.11 

VII 

-.10 

-.30 

-.07 

.18 

.04 

.01 

.99 

.05 

.13 

.27 

VIII 

.28 

-.00 

-.05 

-.09 

-.16 

-.21 

-.02 

.9S 

.12 

-.02 

IX 

.07 

.10 

.03 

.16 

-.13 

-.27 

.04 

•  .13 

.55 

.08 

X 

.08 

-.02 

-.00 

-.18 

.17 

-.10 

.04 

-.07 

-.IE 

.88 

Matrix  of  Cosines  SliOwing  n:elationship  Between  ROS  Vectors 
After  Being  Fitted  into  0BS2  Configuration 

ROS 

Vectors 
I 


II 

-.20 

III 

-.09 

-.09 

IV 

-.22 

-.02 

-.07 

V 

-.07 

.23 

.16 

.10 

VI 

.12 

.01 

.15 

.02 

.31 

VII 

-.07 

-.35 

-.05 

.20 

-.04 

-.03 

VIII 

.35 

-.07 

-.07 

-.11 

-.28 

-.39 

.09 

IX 

-.01 

.22 

.08 

.26 

-.05 

-.23 

.16 

X 

.07 

-.1.0 

.20 

-.34 

.01 

-.24 

.32 

ERIC 
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Table  5 


Multiple  Linear  Regression  of  Each  ROS  Dimension 
on  STUDS  Dimensions 


ROS  Dimensions 


II 


III  IV 


VI 


VII     VIII  IX 


Multiple  Correlations 


STUDS 
Dimension 

I 

II 

III 

IV 

V 

VI 

VII 

VIII 

IX 

X 


.91 

.78 

79 

.71 

77 

.53 

.76 

.82 

.60 

.65 

latrix 

of  Direction 

Cosines  Showing  Correspondence 

Jetween 

1  Fitted 

ROS 

Vectors 

and 

STUDS 

Dimensions 

.87 

-.04  - 

.05 

.03  - 

.06 

.14 

-.06 

.32 

.09 

-.14 

-.27 

.84  - 

.00 

-.01  - 

.03 

-.22 

-.41 

-.24 

.05 

-.15 

-.23 

-.15 

.92 

.02 

.33 

.11 

.07 

-.22 

.27 

.19 

-.17 

-.18 

.04 

.89 

.12 

-.04 

.16 

-.29 

-.08 

.08 

-.23 

.02 

.32 

-.12 

.88 

-.08 

.01 

-.25 

.10 

.17 

.03 

-.16 

.17 

.14 

.22 

.86 

.13 

-.39 

-.12 

.09 

.03 

-.29  - 

.15 

.18 

.05 

-.09 

.78 

.11 

-.00 

.02 

.10 

-.22  - 

.03 

.00 

.06 

-.35 

.21 

.68 

-.16 

.19 

.10 

.05 

.02 

.37  - 

.17 

-.13 

-.28 

-.06 

.90 

-.23 

.10 

-.27  - 

.00 

.02 

.09 

-.17 

„23 

.13 

-.23 

.89 

Matrix  of  Cosines  Showing  Relationships  Between  ROS  Vectors 
After  Being  Fitted  into  STUDS  Configuration 


ROS 

Vectors 


I 


II 

-.26 

III 

-.34 

-.13 

IV 

-.05 

-.24 

.02 

V 

-.34 

-.18 

.62 

-.01 

VI 

.14 

-.20 

.20 

.03 

.12 

VII 

.06 

-.75 

-.02 

.20 

.22 

.05 

VIII 

.57 

-.29 

-.37 

-.28 

-.36 

-.49 

.23 

IX 

.03 

.17 

.27 

.23 

-.04 

-.10 

-.37 

-.19 

X 

-.09 

-.47 

.25 

-.00 

.38 

-.10 

.43 

.11 

.40 


Table  6 


Multiple  Line::vr  Regression  of  Each  0BS2  Dimension 
on  OBSl  Dimensions 


"IBrS^  !J linens  ions 

I       II      IT     T      V      VI     VII    vm  IX 

niltzri^  Correl-^tions 
.98      .92  .31      .92      .88      .S?      .9r  .88 


Matrix  of  Dir-ec 
Between  Fitter  01. 


CDsi-Ties  -^-lowing  Corrcr^^rrDndance 
ectors  arnd  OBSl  Liiv-^'--^orE5 


OBSl 

Dimensions 


I 

.96  - 

.01 

.  CI 

Dl 

.03 

1"*! 

u.  

i: 

-.03 

.55 

- .  1  . 

.  37 

-.09  - 

.01 

-.20 

Jit: 

111 

-.04  - 

.14 

.  14 

03 

.IC 

IV 

-.12  - 

.08 

12 

33 

_  1  ■ 

.17 

.  02 

,or 

V 

-.08  - 

.12 

.7 

1:6 

.10 

-.00 

VT 

.09  - 

.05 

-  Jl 

.09 

,  00 

LI 

V  i 

-.16  - 

.04 

:2 

.12 

.05 

12 

II I  i 

.12  - 

.03 

-.04 

.14 

-.  oT 

.96 

- .  12 

-.02 

IX 

.00  - 

.23 

.09 

-.03 

.09  - 

.10 

.55 

-.14 

-.13  - 

.0^ 

2 

.13 

.14 

.09 

-.14 

♦sxrix  of 

Cost 

Relationship 

0BS2 

Vectors 

After  Being  Frr-         iC  ^  OBSl  Cnitigura- ion 


Xtii.  .msiOTiS 


1 

II 

-.01 

III 

-.08 

IV 

-.13 

- .  25 

V 

-.10 

-.27 

v  < 

.  Jl 

VI 

.08 

.05 

-.11 

.  :3 

-.19 

/II 

-.23 

-.14 

.02 

..i6 

.17 

.13 

/III 

.16 

.03 

-.25 

-^33 

-.15 

-.23 

IX 

.01 

-.41 

.  ^-- 

.8 

.10 

-.01 

\ 

-.23 

-.09 

.1: 

06 

.18 

-.23 

•  .G5 
-  .14 


.24 
.06 


,25 
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Table  7 


■ffe  Hiple  Llr^ear  Regression  of  Esrh  Rated  Property 


Pnoperties 


12  3  4  5  6  7 

Cone-    Diff.    MS,        lnm.      Coop.     Speed  ME. 


Nftxr:t\p"ie  C^rrelatims 
.  Z^         .  2-7  . 


.49 


.4C 


.33 


•'.,tf-:rix  of  Diroctiojij  Cosi::n(es  jaswimg  Correspc^^ience 
Betw^^Ti  ><cvr'i  Fitt'fd  Propert:'  fectors  and  AW;  iSimensions 


Configuration 
Dimensions 

I 

II 

IIT 

IV 

V 

VI 

Vli 

VIII 

IX 

X 

Properties 


-.01 

.  1^ 

-.19 

.27 

-.03 

.  ^ 

.43 

-.65 

-.09 

.2Z 

,37 

1 

.1: 

.20 

.02 

.17 

.G2 

.  09 

.OS 

-.17 

■)2 

.11 

-.00 

.22 

-M 

-..'1^ 

-.21 

-.41 

.15 

.4^ 

-.49 

. 

-.50 

.67 

-  i2 

-.3f 

.09 

-.12 

.26 

-  ii? 

-.30 

.38 

25 

.15 

.15 

.  54 

.25 

.56 

-.10 

.11 

,63 

.38 

.43 

-  .38 

-.•-5 

-.29 

-  .69 

.33 

-.49 

-.14 

Cosinuis  of  Angles  Betwee:  Property  Vectors 


1.  Concentration 

2.  Difficulty 

3.  Manual  Skil- 

4.  Importance 

5.  Cooperation 

6.  Speed 

7.  Mental  Effort 


.92 


m 

.44 
96 


,11 
,29 
.12 
,69 


■.15 
.93 
.57 


-.07 
-.25 


.55 


1 


Table  ^ 


Multiple  Linear  Regression  of  Each  Rated  Property 
on  ROS  Dimensions 


Properties 

12  3  4 

Cone.     Diff.    MS.  Imp. 

Multiple  Correlations 

.30        .27        .25  .29 


5  6  7 

Coop.    Speed  ME. 


.48 


.36 


.30 


Matrix  of  Direction  Cosines  Showing  Correspondence 
Between  Seven  Fitted  Property  Vectors  and  ROS  Dimensions 


Configuration 
Dimensions 

I 

II 

III 

IV 

V 

VI 

VII 

VIII 

IX 

X 

Properties 


-.10 

-.11 

.25 

-.21 

.36 

-.11 

-.11 

.63 

.67 

-.42 

.10 

.05 

.04 

.72 

.10 

.12 

.24 

.15 

.05 

.28 

.13 

.18 

.16 

.41 

.62 

.27 

.44 

.24 

-.30 

-.22 

-.59 

-.45 

-.14 

-.66 

-.20 

.40 

.40 

-.36 

.17 

-.78 

.44 

.44 

.14 

.02 

.11 

-.06 

.21 

-.11 

.03 

.27 

.29 

-.12 

.06 

.00 

.23 

.23 

.32 

.42 

-.18 

.17 

.06 

-.02 

.31 

-.32 

-.17 

-.11 

-.51 

.34 

-.16 

-.07 

Cosines  of  Angles  Between  Property  Vectors 
in  Configuration 


1.  Concentration 

2.  Difficulty 

3.  Manual  Skill 

4.  Importance 

5.  Cooperation 

6.  Speed 

7.  Mental  Effort 


.97 

-.20 

-.32 

.65 

.F5 

.41 

-.28 

-.27 

.5? 

.16 

.61 

.55 

.*41 

.82  -.24 

.95 

.98 

-.31 

.53  -.24 

.57 

172 


1  ()ir 

^    V,',  J 


Table  9 


Multiple  Linear  Regression  of  Each  Rated  Property 
on  SUPS  Dimensions 


Properties 

1  2  3  4  5  6  7 

Cone.    Diff.    MS.        Imp.      Coop.    Speed  ME. 


Multiple  Correlations 
,33        .30        .28        .28  .46 


,32 


.35 


Matrix  of  Direction  Cosines  Showing  Corres-nondence 
Between  Seven  Fitted  Property  Vectors  and  SUPS  Dimensions 


Configuration 
Dimensions 

I 

II 

III 

IV 

V 

VI 

VII 

VIII 

IX 

X 

Properties 


.01 

.15 

.22 

-.00 

.41 

-.15 

.11 

.24 

.28 

-.68 

-.33 

-.00 

-.24 

.30 

.00 

.06 

.09 

.14 

.02 

.05 

.12 

-.11 

-.13 

...33 

-.24 

-.22 

.01 

-.03 

.21 

.38 

-.26 

-.00 

-.13 

-.32 

.31 

.76 

.51 

-.42 

.44 

-.60 

.70 

.69 

-.38 

-.61 

-.08 

-.48 

.30 

-.33 

-.51 

.21 

.09 

-.08 

-.20 

.09 

.21 

.05 

.26 

.32 

-.21 

.23 

.41 

.22 

.21 

-.22 

.01 

-.27 

-.54 

.39 

-  34 

.04 

Cosines  of  Angles  Between  Property  Vectors 
in  Configuration 


1.  Concentration 

2.  Difficulty 

3.  Manual  Skill 

4.  Importance 

5.  Cooperation 

6.  Speed 

7.  Mental  Effort 


.88 

-.47 

-.45 

.61 

.52 

.28 

-.53 

-.30 

.23 

-.48 

.71 

.44 

-.02 

.75 

-.56 

.92 

.96 

-.54 

.50 

-.45 

.52 
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Table  10 


mr^iple  -r^^  Regression  of  Eat^  Rated  Pro^rty 
on  OBSl  Dimensions 


Properties 

1  2  3  4  5  -7 

-a.:'".    Diff.    MS.        Imp-      Coop.    Sreev^  ME. 

^inzQtiple  Correlatior's 

.3  .23        .27        .39        .49        .36  ,31 

Ma'^r  X  c       ir::j:crion  Cosines  Showing  Correspondences 

Be  w   3n  Fitted  Property  Vectx>rs  and  OBSl  L>zmeTn5?  ons 

Configir^-rtio-T 
Dimen5>  :is 


I 

-  '  J4 

-.01 

.13 

-.19 

.25 

-.19 

-.05 

I~ 

/I.' 

.69 

-.57 

-.06 

-.19 

-.20 

.52 

I'.l 

-.05 

.15 

.22 

.09 

.20 

.16 

IT 

.42 

-.01 

.24 

-.08 

.16 

.38 

X 

.18 

-.45 

-.08 

-.24 

-.43 

.14 

ir 

.21 

-.53 

.37 

-.50 

.44 

.53 

v- 

-.  7 

-.34 

.11 

-.11 

.26 

-  . "  2 

-.30 

VI  :i 

:1 

.01 

-.15 

.20 

.31 

.15 

IX 

.42 

.38 

-.03 

.43 

.56 

.27 

.31 

X 

-  .49 

-.12 

-.35 

-.69 

.33 

_.=2 

-.22 

Prop^^t   es  Cosines  of  Angles  Between  Property  Vector 

in  Configuration 

1 .  C:'^'  'ntratrion 

2.  D    -  cult  .72 

3.  ±  Sk:  11  -.35 

4.  :iniH>'rrranc:'  .85 

5.  GczBffiPrration  -.19 

6.  Spwsr  '  .73 

7.  Mentt     Effort  .88 


.60 

.40 

.07 

.23 

.37 

-.16 

.19 

.20 

.91 

-.05 

.89 

-.61 

.64 

-.35 

.48 
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Table  11 


Multiple  Linea*-  -^qression  of  Each  Rated  Property 

0BS2  DimenslOTB 


Properties 

1  2  3  4  5  6  7 

Cone.    3Iif:- ,   MS,        Imp,      Coop,    Speed  ME. 

Mu-j^ip_-  Correlations 

.34        .2  7       .26        .52        .48        .41  ,33 

•latrix  of  Direct ic    Cosines  Showing  Correspondence 
Between  Seven  Fitted  Property  Vectors  and  0BS2  Dimensions 

Confr  jra"  n 
Dime:  ion; 


-.07 

.26 

-.19 

.27 

-.Ob 

-.01 

.07 

-.69 

-.37 

-.13 

-.35 

.25 

■I 

.24 

;6 

.00 

.22 

-.01 

.33 

.34 

/•/ 

-.13 

-.27 

-.06 

-.24 

-.15 

-.06 

-.20 

-.40 

-.41 

-.23 

-.59 

-.08 

T 

.74 

.§4 

-.39 

.26 

-.43 

.39 

.75 

II 

-.31 

-  j7 

.07 

-.16 

.20 

-.16 

-.31 

.24 

11 

-.20 

-.13 

.11 

.11 

.15 

J.  A 

.37 

.48 

.12 

.20 

.64 

.18 

.35 

X 

-.16 

-.03 

.08 

-.67 

.39 

-.42 

-.07 

Properties  Cosines  of  Angles  Between  Property  Vectors 

in  Configuration 

1 ,  Concentrat  ion 


2. 

Difficulty 

.90 

3. 

Manual  Skill 

-.28 

-.45 

4. 

Importance 

.53 

.24 

.27 

5. 

Cooperation 

-.13 

-.05 

.59 

-.19 

6. 

Speed 

.70 

.37 

.30 

.90     - . 04 

7. 

Mental  Effort 

.96 

.96 

-.43 

.37  -.17 

.53 
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Table  12 

Multiple  Linear  Regression  of  Each  Rated  Property 
on  STUDS  Dimensions 

Properties 

1  2  3  4  5  6  7 

Cone.     Diff.    MS.        Imp.      Coop.     Spee'  ME. 

Multiple  Correlations 
.31        .22        .22        .29        .43        .31  .31 


Matrix  of  Direction  Cosines  Showing  Correspom^nice 
Between  Seven  Fitted  Property  Vectors  and  SlUF^S  linaensions 


Configuration 
Dimensions 


I 

.07 

.18 

II 

.59 

.56 

III 

.23 

.38 

IV 

.00 

.07 

V 

-.18 

-.12 

VI 

.19 

.11 

VII 

-.11 

-.38 

VIII 

-.03 

-.15 

IX 

.68 

.55 

X 

-.23 

-.08 

Properties  Cosines  of  Angles 

in  Configuration 


.10 

-.12 

.32 

-.07 

.19 

-.67 

.26 

-.03 

.15 

.58 

-.38 

.27 

-.06 

.28 

.43 

- .  23 

.09 

.09 

.11 

.01 

-.17 

.00 

-.15 

-.60 

-.01 

-.13 

.44 

-.26 

.37 

.21 

.36 

-.15 

.36 

-.07 

-.28 

.32 

.00 

.27 

.15 

-.24 

.25 

.67 

.35 

.50 

.49 

-.07 

-.41 

.68 

-.34 

-.10 

Between  Property  Vectors 


1.  Concentration 

2.  Difficulty 

3.  Manual  Skill 

4.  Importance 

5.  Cooperation 

6.  Speed 

7.  Mental  Effort 


.91 

.34 

-.55 

.86 

.74 

-.22 

.00 

-.03 

.37 

-.28 

.74 

.60 

-.01 

.78 

-.08 

.90 

.97 

-.61 

.75 

-.10 

.54 

176  Id  (J 


Table  13 

,v.^  ^nmension  I:    Complexity  of  Task  Context 


Task  Scale 
Number  Task  Statement  Value 

Context  Straii^ht forward  (task  itself  not  necessarily  easy) 


105 

PREPARING  ■^fARKER  MARINE  FOR  LAUNCH  (ASW) 

1.05 

108 

DROPPING  PARACHUTE  FLARE  (ASW) 

1.01 

100 

KEEPING  C3IUTES  LOADED  lAW  NAVS  INSTRUCTIONS  (ASW) 

1.01 

098 

FIRING  SmD  CHUTES  LOCALLY  (ASW) 

1.00 

106 

LAUNCHING  MARKER  MARINE  (ASW) 

.99 

103 

FIRING  LIBRASCOPE  MANUALLY  (ASW) 

.99 

117 

SETTING  UF  MAI  BAGS 

.98 

095 

CHECKING  SONO  CHUTES  ON  PFI  (ASW) 

.98 

110 

CHECKING  HJLCHER  CAMERA  SERVICEABILITY  (FRAME  COUNTER  § 

MOTOR  VIBRATION) 

.97 

096 

SETTING  i^hTTCHES  ON  SONOS  FOR  PROPER  DEPTH/LIFE  CASW) 

.96 

ERIC 


Comment  s 

Tasks  loading  on  the  positive  end  of  this  dimension  tend  to 
involve  fairly  gross  physical  actions.    Many  of  these  are 
relatively  straightforward,  and  little  judgement  is  required 
when  to  initiate  action,  since  the  stimulus  to  initiate  action 
generally  originates  from  outside  the  individual,  often 
from  another  member  of  the  crew. 

The  situation  or  milieu  in  which  the  action  can  be  performed 
is  obviously  important.     For  example,  tasks  144  and  145  may 
not  be  complex  to  perform  in  and  of  themselves,  but  when  they 
have  to  be  done  under  the  pressure  of  an  operational  mission, 
the  larger  work  context  of  which  they  are  a  part  can  take  on 
immense  complexity. 

Context  Complex  (task  itself  not  necessarily  difficult) 

021  CHANGING  RANGE  SCALE  DURING  HOMING  -.79 

152  CHECKING  SERVICEABILITY  OF  ARC505  WITH  EXTERNAL  AGENCY 

ON  VOICE  (COMM)  --^S 

058  SELECTING  5  ADJUSTING  CONTROLS  FOR  MAD  OPERATION  (DETECTION)  -.81 

149  CALIBRATING  MODEM  (COMM)  -.82 

150  CHECKING  SERVICEABILITY  OF  ORESTES  CONTROL  BOX  (COMM)  -.84 
145  CHANGING  PAPER,  RIBBONS  5  TAPE  IN  TELEPRINTERS  (COMM)  -.85 

151  CHECKING  SERVICEABILITY  OF  JASON  CONTROL  BOX  (COMM)  -.86 
144  CHECKING  PAPER,  RIBBONS  5  TAPE  IN  TELEPRINTERS  (COW)  -.89 
017  ADVISING  PILOT  OF  HEADING  TO  PERFORM  HOMING  (ASV)  -.91 


Table  14 


AVE  Dimension  11:    Multivariate  Nature  of  Task  and  Degree 
of  Interpretation/Decision  Making  When  Performing  Task 

Task  Scale 
Number  Task  Statement  Value 

Important  Interpretation/Decision  Making  Component, 
 Typically  with  Many  Facets  

072      OBTAINING  TURN  COmi  (AURAL  LISTENING-DETECTION)  .57 
071      CATEGORIZING  TARGET  BY  CLASS,  DOPPLER,  DISTANCE  (AURAL 

LISTENING-DETECT)  -56 

025      MAINTAINING  VARIABLE  RANGE  M-VRKER  ON  TARGET  (ASV)  .54 

027  CALLING  ACCURATE  ON-TOP  (ASV)  .53 
029      INSPECTING  MAP  FOR  IDENTIFIABLE  LANDMARKS  (ASV)  .51 

028  ADJUSTING  SCOPE  PRESENTATION  FOR  BEST  MAP  READING  (ASV)  .51 
070      DISCRIMINATING  TARGET  FROM  BACKGROUND  (AURAL  LISTENING- 
DETECTION) 

010      CHECKING  QUALITY  OF  SCOPE  PRESENTATION  ON  GROUND  VIDEO 

CHECK  (ASV)  ,  -47 

033      INFORMING  NAVIGATOR  OF  LANDMiXRK  P,  ITS  RANGE  P.  BEARING  (ASV)  .47 

008      SETTING  UP  GIVEN  SECTOR  FOR  TRANSMITTER  CHECK  (ASV)  .46 


Comments 


Associated  with  the  high  end  of  this  dimensi-  n  are  tasks  with 
much  interpretation/decision  making,  generally  involving  many 
different  variables. 

The  AVE  configuration  vectors  relating  most  strongly  to  difficulty, 
manual  skill,  and  mental  effort  have  respective  direction  cosines 
of  .43,  -.65,  and  .37  with  this  dimension.    This  profile  is 
consistent  with  the  interpretation  ascribed  this  scale. 


Straightforward  Tasks  with  Low  Level 
of  Interpretation/Decision  Maki ng 

107  PREPARING  PARACHUTE  FLARE  FOR  DROP  (ASW)  -.47 
154  MAKING  A  MESSAGE  TAPE  (COMM)  -.48 
120  PFI  5  PRESETTING  OF  ECM  COMPONENTS  -.50 
109  RETURNING  PARACFiUTE  FLARE  TO  STORAGE  (ASW)  -.54 

108  DROPPING  PARACHUTE  FLARE  (ASIV) 
002  CHECKING  METER  VOLTAGES  ON  TURN-ON  (ASV) 
093  FIRING  PETRO  LOCALLY  (ASW) 
119  ACTING  AS  A  MAI  DROP  MASTER  --56 
118  DROPPING  mi  BAGS  IN  SEQUENCE  --64 
117  SETTING  UP  MAI  BAGS  --67 
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-.54 
-.55 
-.55 


Table  15 

AVE  Dimension  III:    Fineness/Crossness  of  Task  Activity 


Task  Scale 
Number  Task  Statement  Value 

Manipulating,  Finetuning,  Checking  Kinds  of  Behaviours 

089      IDENTIFYING  5  RECTIFYING  AP102  FAULTS  LISTED  IN  CHECKLIST 

(DETECTION)  .93 
061      IDENTIFYING  5  RECTIFYING  MAD  FAULTS  LISTED  IN  CHECKLIST 

CDETECTION)  • 93 

078      IDENTIFYING  5  CALLING  ECHOES  DURING  JULIE  OPERATIONS 

(DETECTION)  -87 
077  SELECTING  PROPER  BUOYS  DURING  JULIE  OPERATIONS  (DETECTION)  .86 
020  CALLING  RANGES  DURING  HOMING  (ASV)  .80 
088      SETTING  UP  AR102  REMOTE  CONTROL  TO  RECORD  A  FACILITY 

(DETECTION)  -79 


073      INSPECTING  JULIE  RX  §  AJH501  RECORDERS  DURING  PFI   (DETECTION)  .76 
056      CHECKING  MAD  INTERNAL  NOISE  (DETECTION)  .74 
069      LISTENING  TO  WATER  AND  TARGET  AUDIO  (AURAL  LISTENING- 
DETECTION)  -73 
040      CHANGING  ASV  RECEIVER  CRYSTALS  (ASV)  .73 


Comments 


Tasks  relating  to  checking,  fine  manipulation,  tuning, 
identifying,  and  so  on  tend  to  load  highly  on  this  dimension, 
whereas  tasks  requiring  gross  response  behaviours  predominate 
on  the  lower  end.     While  this  dimension  in  the  STUDS  configuration 
share  direction  cosines  of  .58,  -.38  and  .43  with  Difficulty, 
Manual  Skill,  and  Mental  Effort,  respectively,  relationships  with 
any  of  the  seven  properties  in  configurations  of  the  remaining 
groups  were  generally  quite  small. 

Gross  Behaviours,  Heavy  Lifting,  Carrying,  General  Dogwork 

136  IDENTIFYING  BASIC  RADAR  TYPE  FROM  AURAL  PP.F  (ECM)  -.44 

114  TAKING  PICTURES  IN  NOSE  WITH  HULCHER  CAMERA  -.44 
141  ADJUSTING  AE  GAIN  AND/ OR  ATTENUATION  DURING  HOMING  (ECM)  -.45 

115  KEEPING  ACCURATE  HULCHER  CAMERA  LOG  (ASW)  -.45 
164  PRESSING  SYNCH  BUTTON  ON  TIME  CHECK  CUE  ON  LF  (COMM)  -.46 
169  IDENTIFYING  AND  RECTIFYING  COMM  EQUIPMENT  FAULTS  LISTED 

IN  CHECKLIST  --46 

093  FIRING  RETRO  LOCALLY  (ASW)  --48 
130      ADJUSTING  SCOPE  FOR  BEST  D/F  SIGNAL  (ECM)     '  -.50 

094  UNLOADING  5  TURNING  OFF  RETRO  (ASW)  --52 
132      READING  ANALYZER  FOR  PW  5  PRF  (ECM)  -.54 
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Table  16 


AVE  Dimension  IV:  Criticality  of  Task  Activity  to  Mission  Success 
Task 

Number  Task  Statement 

Important  to  Successful  Completion  of  Mission 
That  Task  Done  Correctly  


086  LOADING  TAPES  IN  AR102  TAPE  RECORDERS  (DETECTION) 
163      KEYING  JASON  (COm) 

077  SELECTING  PROPER  BUOYS  DURING  JULIE  OPERATIONS  (DETECTION) 

149  CALIBRATING  MODEM  (COMM) 

087  CHECKING  AR102  RECORDER  METER  FOR  RECORDING  ON  BOTH  CHANNELS 

(DETECTION) 

078  IDENTIFYING  5  CALLING  ECHOES  DURING  JULIE  OPERATIONS 

(DETECTION) 

067      PIACING  RULER  ON  CHART  FOR  TARGET  RANGE  CHECK  (SSQ47- 
DETECTION) 

150  CHECKING  SERVICEABILITY  OF  ORESTES  CONTROL  BOX  (COMM) 

088  SETTING  UP  AR102  REMOTE  CONTROL  TO  RECORD  A  FACILITY 

(DETECTION) 

145      CHANGING  PAPER,  RIBBONS  §  TAPE  IN  TELEPRINTERS  (COMM) 

Comments 

ir-  .eral  the  tasks  at  the  high  end  of  the  scale  are  those,  that 
:  r  ;.  done  correctly,  could  lead  to  a  mission  not  being  completed 
s.>  c-^sfully.    Tasks  that  are  "critical"  in  this  regard  are  not 
Kecci;sarily  difficult,  nor  do  they  necessarily  demand  a  great  deal  of 
concentration  or  mental  effort  as  evidenced  by  the  fact  that  vectors 
relating  most  closely  to  these  properties  in  the  ROS  configuration 
have  respective  cosines  of  .16,  .18,  and  .24  with  Dimension  IV.  (Thi 
is  to  be  expected,  since  a  simple  task  like  throwing  an  integral 
switch  could  be  critical). 

Dimension  IV  in  the  ROS  configuration  is  more  clearly  interpreted 
as  relating  to  "criticality",  particularly  when  relationships  to  the 
property  vectors  are  considered  (see  Table  29).    While  Dimension  IV 
in  the  OBSl  and  0BS2  configurations  are  strongly  related,  the  ROS 
Dimension  IV  does  not  compare  as  closely  to  the  corresponding 
dimension  in  any  of  the  other  groups. 


Scale 
Value 
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Table  16  Ccontinued) 


.n  •»  Scale 
Task  u  1 

Number  Task  Statement  value 

Not  so  Crucial  to  Successful  Completion 
of  Mission  (Larger  Error  Tolerance) 


057  ORIENTING  MAD  (ID-378) 

041  INSPECTING  ASH  EQUIPMENT  IN  NOSE  ON  PFI  (DETECTION)  -.43 

044  CHECKING  SERVICEABILITY  OF  ASH  SYSTEM  (DETECTION)  -^^ 

039  IDENTIFYING  5  RECTIFYING  ASV  FAULTS  LISTED  TN  CHECKLIST 


-.43 
-.43 
-.43 


-.44 


048      IDENTIFYING  A  "SIGNAL  OUT"  SITUATION  ON  ASH  (DETECTION)  -.44 
038      C0f4MUNlCATING  WITH  PILOT  IN  CALM  CONFIDENT  MANNER  DURING 
WX  (ASV) 

047      IDENTIFYING  A  PEAK  ON  AN  ASH  TRADE  (DETECTION) 
055      CENTERING  RECORDER  PEN  USING  OUTPUT  BALANCE  §  PEN  POSITION 
CONTROLS -MAD 

054      SETTING  PEN  SELECTION  SWITCHES  FOR  MAD  ON  RECORDER 
(DETECTION) 

030      INSPECTING  SCOPE  FOR  SIMmR  CONTOURS  WHEN  MAP  READING 
(ASV) 


-.55 
-.46 

-.47 

-.55 

-.57 
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Table  17 


AVE  Dimension  V:    Source  Initiating  Activity 

Task  Scale 

Number                                  '  Task  Statement  Value 

Internal  or  Self  Initiated 

120      PFI  §  PRESETTING  OF  ECM  COMPONENTS  .65 

162      KEYING  ORESTES  (COMM)  -63 

160  LOOKING  UP  A  MANUAL  FREQUENCY  ON  ARC38  (COMM)  .62 
159      SETTING  A  PRESET  FREQUENCY  ON  ARC38  (COf-ff-O  .61 

161  SETTING  A  MANUAL  FREQUENCY  ON  ARC38  (COMM)  .61 
152      CHECKING  SERVICEABILITY  OF  ARC505  WITH  EXTERNAL  AGENCY 

ON  VOICE  (COMM)  .56 

146      LOADING  d,  CHECKING  LP  BLACK  BOX  (COMM)  .56 

149  CALIBRATING  MODEM  (COMM)  .51 

150  CHECKING  SERVICEABILITY  OF  ORESTES  CONTROL  BOX  (COMM)  .51 
043      SETTING  PEN  SELECTION  SWITCHES  FOR  ASH  ON  RECORDER 

(DETECTION)  .51 

Comments 

Tasks  at  the  high  end  of  this  scale  tends  to  be  those  which  the 
individual  himself  initiates.    Those  at  the  opposite  end  tend  to  be 
initiated  by  others,  either  within  or  outside  the  aircraft. 

Externally  Initiated 

015      ALIGNING  RANGE  §  BEARING  MARKERS  ON  TARGET  (ASV)  -.48 
083      CHECKING  a  CALLING  BUOY  SERVICEABILITY  (RX/AUDIO/ 

HYDROPHONE)   (JULIE)  -.49 

062  SETTING  SWITCHES  FOR  BUOY  SELECTION  (SSQ4 7 -DETECTION)  -.50 

063  DETECTING  TARGET  FROM  RECORDER  (SSQ47)  -.51 
086      LOADING  TAPES  IN  AR102  TAPE  RECORDERS  (DETECTION)  -.52 

079  MEASURE/PASS  SINGLE  ECHO  MASTER  RANGES  (JULIE-DETECTION)  -.54 

080  MEASURE/PASS  SINGLE  ECHO  SLAVE  RANGES  (JULIE-DETECTION)  -.56 
082      MEASURE  §  PASS  JULIE  DOUBLE  ECHO  RANGES-DROPPED 

SIMULTANEOUS,  DEEP/SHALLOW  -.62 

074      CHANGINE  PAPER  IN  AJH501  RECORDER  (JULIE-DETECTION)  -.63 

081  I4EASURE/PASS  DOUBLE  ECHO  RANGES  (JULIE-DETECTION)  -.74 
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Table  18 


AVE  Dimension  VI:    Teamv/ork  or  Cooperation  Involved 


Task 


Scale 


Number                                    Task  Statement  Value 

Tasks  Done  Primarily  by  Self 

063      DETECTING  TARGET  FROM  RECORDER  (SSn47)  .63 

019      ADJUSTING  SCOPE  OPTIMAL  TARGET  PRESENTATION  (ASV)  .61 

141      ADJUSTING  AE  GAIN  AND/OR  ATTENUATION  DURING  HOMING  (ECM)  .60 

164  PRESSING  SYNCH  BUTTON  ON  TIME  CHECK  CUE  ON  LF  (COMM)  .S& 
090      CHECKING  5  RECORDING  ALL  ASW  STORES  ON  BOARD  DURING 

PFI  (RETRO)  '58 

169      IDENTIFYING  5  RECTIFYING  COMM  EQUIPMENT  FAULTS  LISTED  IN 

CHECKLIST 

002      CHECKING  METER  VOLTAGES  ON  TURN-ON  (ASV)  .57 

062      SETTING  SWITCHES  FOR  BUOY  SELECTION  (SS047 -DETECTION)  .56 

130  ADJUSTING  SCOPE  FOR  BEST  D/F  SIGNAL  (ECM)  .55 
066      TAKING  TARGET  RANGE  CHECK  WITH  STOP  WATCH  (SSQ47-DETECTION)  .50 


Comments 


Loading  on  the  high  end  of  this  dimension  are  tasks  the  individual 
does  primarily  by  himself.    Further,  those  tasks  tend  to  require 
concentration  (.59),  are  difficult  (.49),  are  important. (.38) , 
may  have  to  be  done  quickly  (.45),  and  require  considerable  mental 
effort  (.67),  as  evidenced  by  the  direction  cosines  between  this 
dimension  and  the  configuration  vectors  relating  most  strongly  to 
these  properties.    Tasks  at  the  opposite  pole  tend  to  demand  manual 
skill  (-.49)  and  cooperation  (-.50).    These  relationships  represent 
a  profile  that  is  consistent  with  the  interpretation  given  this  scale. 

Many  of  the  tasks  loading  at  the  low  end  of  the  dimension  require 
cooperation  in  the  sense  that  the  individual  must  rely  on  someone 
else  to  do,  or  not  do  something  (i.e.,  throw  a  switch,  before  or  while 
the  task  is  performed). 

Tasks  Requiring  Reliance  on  Someone  Else  to  Do  or  Not  Do  Something 

108  DROPPING  PARACHUTE  FLARE  (ASW) 

094  UNLOADING  §  TURNING  OFF  RETRO  (ASW) 

157  LOGGING  ALL  MESSAGES  RECEIVED  S  TRANSMITTED  ON  LF  (COMM) 

104  UNLOADING  LIBRASCOPE  (ASW) 

099  UNLOADING  SONO  CHUTES  (ASW) 

097  LOADING  SONOS  IN  CHUTES  (ASW) 

078  IDENTIFYING  §  CALLING  ECHOES  DURING  JULIE  OPERATIONS 
(DETECTION) 

077  SELECTING  PROPER  BUOYS  DURING  JULIE  OPERATIONS  (DETECTION) 

140  ESTIMATING  RELATIVE  TARGET  MOVEMENT  S  DRIFT  (ECM) 

116      INSTALLING  MAI  CHUTES  .^-i 
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-.37 
-.41 
-.42 
-.44 
-.46 
-.48 

-.53 
-.56 
-.59 
-.67 


Table  19 

AVE  Dimension  VII:    Discreteness  of  Task  Event 

Scale 

Task  „  ^  Value 

Number  Task  Statement 

Tasks  Involving  Single  Discrete  Behaviours 

120      PFI  §  PRESETTING  OF  ECM  COMPONENTS  -^J 

093  FIRING  RETRO  LOCALLY  (ASW)  ' 

   ,„..r..,-r»to   r.«n*,-iniTp    CtADP   TO   "^TORAriF    (ASWl  '^^ 

052  CHECKING  BATTERY  IN  MAD  SET  (DETECTION)  -^^ 

104  UNLOADING  LIBRASCOPE  (ASW)  ' 

108  DROPPING  PARACHUTE  FIARE  (ASW)  • 

094  UNLOADING  f,  TURNING  OFF  RETRO  (ASW) 

no  cJl^CK?NG  HULCHER  CAMERA  SERVICEABILITY  (FRAME  COUNTER  5 

MOTOR  VIBRATION)  \f. 

107  PREPARING  PARACHUTE  FL\RE  FOR  DROP  (ASJ  -J" 

059  IDENTIFYING  5  CALLING  MAD  MARKS  (DETECTION)  -46 

Comments 

Tasks  loading  high  on  this  dimension  tend  to  be  those  requiring 
Relative!/  Siscrlte  action.  It  appears  to  be  coincidental  that 
many  of  these  tasks  are  also  of  a  heavy  physical  nature. 

Tasks  on  the  low  end  of  this  scale  appear  to  require  chained 
lasKs  on  i.n^  _  ^^^^  ^^^^^  ^^^^  reasonably  long 

sequences  or  eveiiua  uiicu    ,     ,  .     ^v,^^^  i-acVc 

time  periods.    Many  of  the  activities  involved  m  these  tasks 
ar"  sSch  ?h=.i-  a  later  step  is  contingent  on  what^  occurs  m  a 
former  one.    That  is,  step  X  generally  will  not  oe  performed 
until  step  X-1  has  been  completed. 

Tasks  Involving  Chained  and  Sequenced  Activities 
138      IDENTIFYING  5  RECTIFYING  ECM  FAULTS  LISTED  IN  CHECKLIST  ^ 

nft<;  TATEGORIZING  DOPPLER  (SSQ47 -DETECTION)  "'^^ 

565  SeISnG  BEST  FREOUElicY  FOR  USE  WITH  AGENCY  (COMM)  -39 

147  LOADING  5  CHECKING  HF  BLACK  BOX  (COMM) 

1^4  THANGING  TUNING  UNITS  5  RF  CALIBRATING  (ECM) 

2^  cS  NG  XTAL  SERVICEABILITY  OVER  TUNERS  RANGE  (ECM)  -4 

007  SS?NG  DRIFT  FROM  DRIFT  5  HEADING  MARKERS  (ASV)  -41 

?48  cSnG  SErJkEABILITY  OF  ARO  38  WITH  EXTERNAL  AGENCY  _ 

^An  PCiSSAITTNG  RELATIVE  TARGET  MOVEMENT  5  DRIFT  (ECM)  -47 
'l56      LoS  ALL  ME^sIgES  RECEIVED  ,  TRANSMITTED  ON  HF  (COMM)  -58 
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Table  20 


AVE  Dimension  V7ri:    Hanger  Level  in  Task  Event 


Task  Scale 

Number                                    Task  Statement  Value 

Hazardous  Situation 

118  DROPPING  MAI  BAGS  IN  SEQUENCH  .56 

105  PREPARING  f-lARKER  MARINE  FOR  LAUNCH  (ASW)  .55 

100  KEEPING  CHUTES  LOADED  lAlV  NAVS  INSTRUCTIONS  (ASW)  .55 

095  CHECKING  SONO  CHUTES  ON  PFI   (ASW)  .54 

098  FIRING  SONO  CHUTES  LOCALLY  (ASW)  .52 
107  PREPARING  PARACHUTE  FLARE  FOR  DROP  (ASW)  .52 
103  FIRING  LIBRASCOPE  MANUALLY  (ASWJ  .51 
109  RETURNING  PARACHUTE  FLARE  TO  STORAGE  (ASW)  .51 
092  CLEARING  JAMMED  RETRO  (ASW)  .50 

099  UNLOADING  SONO  CHUTES  (ASW)  .50 


Comments 

Tasks  at  the  high  end  of  this  dimension  are  either  hazardous 
in  and  of  themselves  or  are  performed  under  hazardous  conditions. 
Tasks  at  the  lower  end  tend  to  involve  rather  safe  activities 
usually  performed  under  safe  conditions. 

Configuration  vectors  corresponding  to  the  properties 
concentration  and  speed  show  direction  cosines  of  .38  and  .34 
with  this  dimension.    These  relationships  are  logical  since 
handling  these  dangerous  tasks  safely  requires  considerable 
concentration,  and  many  (but  not  all)  of  the  hazardous  tasks 
are  performed  when  quick  response  is  of  the  essence. 

Nonhazardous  Situation 


127      CHECKING  KD2  CAMERA  (ECM)  -.44 

150      CHECKING  SERVICEABILITY  OF  ORESTES  CONTROL  BOX  (COMM)  -.46 

112      SETTING  UP  SHUTTER  SPEED  §  LENS  OPENING  ON  HULCHER  (ASW)  -.56 

130      ADJUSTING  SCOPE  FOR  BEST  D/F  SIGNAL  (ECM)  -.61 

062      SETTING  SWITCHES  FOR  BUOY  SELECTION  (SS047-DETECTION)  -.62 

019      ADJUSTING  SCOPE  OPTIMAL  TARGET  PRESENTATION  (ASV)  -.63 

141  ADJUSTING  AE  GA7N  AND/OR  ATTENUATION  DURING  HOMING  (ECM)  -.64 
090      CHECKING  §  RECORDING  ALL  ASW  STORES  ON  BOARD  DURING 

PFI  (RETRO)  -.64 
169      IDENTIFYING  §  RECTIFYING  COMM  EQUIPMENT  FAULTS  LISTED  IN 

CHECKLIST  -.64 

164      PRESSING  SYNCH  BUTTON  ON  TIME  CHECK  CUE  ON  LF  (COMM)  -.66 
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Table  21 

AVE  Dimension  IX:    Degree  of  Imagery  or  Orientation  to  Earth 


Task 


Scale 


Number                                    Task  Statement  Value 

Activity  or  Imagery  (i.e.,  Keeping  HC  Log) 
 Related  to  Earth's  Surface  ^ 

116      INSTALLING  MAI  CHUTES  1-03 
078      IDENTIFYING  §  CALLING  ECHOES  DURING  JULIE  OPERATIONS 

077      SELECTING  PROPER  BUOYS  DURING  JULIE  OPERATIONS  (DETECTION)  .77 

140      ESTIMATING  RELATIVE  TARGET  MOVEMENT  §  DRIFT  (ECM)  .70 

112      SETTING  UP  SHUTTER  SPEED  5  LENS  OPENING  ON  HULCHER  (ASW)  .53 

114  TAKING  PICTURES  IN  NOSE  WITH  HULCHER  CAMERA  .51 

115  KEEPING  ACCURATE  HULCHER  CAMERA  LOG  (ASW)  .48 
008  SETTING  UP  GIVEN  SECTOR  FOR  TRANSMITTER  CHECK  (ASV)  .41 
091  TURNING  ON  AND  LOADING  RETRO  (ASW)  -39 
111      ASSESSING  WX  FOR  PROPER  HULCHER  CAMERA  SET-UP  (ASW)  .39 

Comments 


The  profile  of  task  scale  values  on  this  scale  tends  to  lead  one 
to  feel  the  dimension  differentiates  tasks  on  the  basis  of 
whether  they  are  surfacs  oriented  or  not  (either  in  terms  of 
direct  activity  or  imagery  to  support  activity).    Tasks  having 
to  do  with  the  earth's  surface  (ground  /  water)  tend  to  load 
on  the  high  end. 

Scale  value  variability  on  this  scale  is  not  as  large  as  in  many 
of  the  others,  particularly  at  the  low  end.    This  might  suggest 
that  the  low  end  of  the  scale  is  not  well  defined. 

Activity  not  Oriented  to  Earth's  Surface 

120  PFI  §  PRESETTING  OF  ECM  COMPONENTS  -.24 
068      SETTING  SWITCHES  AND  KNOBS  FOR  OPERATION  (AURAL 

LISTENING-DETECTION)  --25 

Oil  PERFORMING  POST  TAKE  OFF  CHECK  (ASV)  -.25 
148      CHECKING  SERVICEABILITY  OF  ARC  38  WITH  EXllEPNAL  AGENCY 

(COMM)  --25 

125      SETTING  UP  PANORAMIC  PRESENTATION  ON  SCOPE  (GRASS-ECM)  -.28 

157  LOGGING  ALL  MESSAGES  RECEIVED  §  TRANSMITTED  ON  LF  (COMM)  -.29 
168      REPLYING  AS  REQUIRED  TO  EXTERNAL  AGENCIES  IN  COMM  OPERATIONS.  -. 30 

006      CHECKING  §  ALIGNING  HEADING  MARKER  (ASV)  --32 

085      INSPECTING  AR102  TAPE  RECORDERS  DURING  PFI  (DETECTION)  -.33 

005      CHECKING  ASV  SECTOR  SCAN  (ASV)  --34 
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Table  22 


AVE  Diinension  X:    Housekeeping  Functions 


Task 

Number  Task  Statement 

Primarily  Checking  and  Housekeeping  Functions 
in  Which  Unsuccessful  Performance 
Not  Generally  Crucial  to  Mission  Success 

115  KEEPING  ACCURATE  HULCHER  CAMERA  LOG  (ASW) 

114  TAKING  PICTURES  IN  NOSE  WITH  HULCHER  CAMERA 

112  SETTING  UP  SHUTTER  SPEED  AND  LENS  OPENING  ON  HULCHER  (ASW) 

083  CHECKING  S  CALLING  BUOY  SERVICEABILITY  (RX/AUDIO/ 

HYDROPHONE)  (JULIE) 

120  PFI  S  PRESETTING  OF  ECM  COMPONENTS 

084  IDENTIFY  S  RECTIFY  JULIE  FAULTS  LISTED  IN  CHECKLIST 

(JULIE  DETECTION) 
089      IDENTIFYING  S  RECTIFYING  AR102  FAULTS  LISTED  IN  CHECKLIST 
(DETECTION) 

088      SETTING  UP  AR102  REMOTE  CONTROL  TO  RECORD  A  FACILITY 
(DETECTION) 

085  INSPECTING  AR102  TAPE  RECORDERS  DURING  PFI  (DETECTION) 
076      CHECKING  CHART  SPEED  WITH  ROOF  TOP  CHECKER  (JULIE-DETECTION) 

Comments 

To  some  extent,  tasks  loading  on  both  ends  of  this  dimension 
involve  checking  and  housekeeping  functions.    The  difference 
between  them  appears  to  be  the  fact  that  those  on  the  low  end, 
if  not  done  correctly,  by  themselves  are  more  likely  to  be 
responsible  for  mission  failure.    The  nature  of  the  profile 
of  direction  cosines  between  this  dimension  and  configuration 
vectors  corresponding  most  closely  to  the  properties 
concentration  (-.38),  importance  (-.69),  cooperation  (.33), 
and  speed  (-.49)  reinforces  this  interpretation. 

Primarily  Checking  and  Set  up  Functions  that, 
if  Done  Incorrectly,  Could  Lead  to  Mission  Failure 

004      CHECKING  S  SETTING  ASV  TILT  (ASV) 

096      SETTING  SWITCHES  ON  SONOS  FOR  PROPER  DEPTH/ LIFE  (ASW) 
165      SELECTING  BEST  FREOUENCY  FOR  USE  WITH  AGENCY  (COMM) 

121  CHANGING  ECM  ANTENNA  IN  AFT  LOWER  FUSELAGE  COMPARTMENT  (ECM)  - 
153      SETTING  ARC505,  TELEPRINTER,  5  I/C  SWITCHES  TO  TRANSMIT 

A  RATT  ^^ESSAGE 
064      DETECTING  TARGET  FROM  HEADSET  (SSQ47) 

074      MAINTAINING  DRIFT  MARKER  ON  TARGET  (ASV)  •  - 

131      CENTERING  SIGNAL  ON  PAN  PRESENTATION  (ECM) 
001      PFI  5  PRESETTING  ASV21  COMPONENTS  (ASV) 
091      TURNING  ON  AND  LOADING  RETRO  (ASW)^^ 
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Obstacles  to  and  Incentives  for  Standardization 
of  Task  Analysis  Procedures 


by 

Robert  W.  Stephenson 
and 

Hendrick  W.  Ruck 

Air  Force  Human  Resources  Laboratory 
Brooks  AFB,  Texas 


The  opinions  and  conclusions  expressed  in  this  paper 
are  those  of  the  authors  and  are  not  necessarily 
those  of  the  United  States  Air  Force. 


A  number  of  critical  papers  have  been  written  regarding  the  status 
of  task  analysis  in  the  Air  Force  and  the  assumptions  upon  which  task 
analysis  is  managed  within  the  Department  of  Defense.    Montemerlo  and 
Harris  (1978)  cited  a  long  list  of  such  papers  at  the  1978  Annual 
Convenf'^Vi  of  the  American  Psychological  Association.    One  of  the  major 
conclus>\d^s  iTT  their  paper  is: 

"...while  everyone  agrees  on  the  need  for  task  analysis, 
there  is  almost  no  agreement  as  to  what  it  is." 

The  wriTters  go  on  to  conclude  that  "the  procedural  approach  to  task 
analysis  has  not  and  cannot  work,"  because  task  analysis  is  essentially 
a  judgmental  process.    Many  other  experts  referenced  in  their  paper  had 
come  to  similar  conclusions. 

In  the  face  of  so  many  opinions  that  task  analysis  should  not  be 
procedural i zed  in  the  first  place,  one  feels  a  bit  awkward  presenting  a 
paper  about  standardization  of  task  analysis  procedures.    As  shall  be 
seen,  however,  our  own  position  is  not  incompatible  with  that  presented 
by  Montemerlo  and  Harris. 

Different  Kinds  of  Task  Analysis 

Before  discussing  obstacles  and  incentives  for  standardization,  U 
is  necessary  to  clarify  wirat  kind  of  task  analysis  we  are  talking 
about.    In  the  Air  Force,  one  can  distinguish  six  different  kinds  of 
task  analysis  as  shown  in  Table  1:    the  task  analysis  associated  with 
the  design  of  new  weapons,  which  is  an  intrinsic  part  of  the  research 
and  development  process;  the  task  analysis  associated  with  the  prepara- 
tion of  Technical  Orders  after  the  weapon  system  has  been  designed 
(these  first  two  types  of  task  analysis  are  usually  conducted  by  a 
weapons  development  contractor  during  the  weapons  development  process); 


Table  I  Six  Different  Kinds  of  Task  Analysis 


Requirements  for  Acceptable 
Task  Analysis  For...  Objective  Judgment  Personnel 

New  systems  design      Specify  tsk  procedures  for  Extremely  High    Developmental  systems 

undesigned  weapon  systems  design  engineers 

Systems  documentation    Specify  task  procedures  for  Very  High       Systems  design  engineers 

previously  designed  weapon 
systems 

Systems  evaluation      Evaluation  of  technical  orders  High  Systems  analysts 

by  government  experts 

Training  design  for     Training  performance  objectives  High  Professionally  trained 

unperformed  jobs       and  design  of  new  courses  for  ISO  analysts 

new  jobs 

Training  design  for  Training  performance  objectives  Moderate  Subject  matter  experts  ' 
existing  jobs         and  design  of  courses  for  with  expertise  in 

existing  jobs  training 

Course  revision        Minor  additions  to  an  existing  Moderate        Subject  matter  experts 

course  who  take  short  courses 

for  orientation 
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the  task  analysis  that  is  conducted  when  the  contractor's  Technical 
Orders  are  evaluated  by  the  government  (in  the  Air  Force,  these  evalu- 
ations are  typically  conducted  at  Edwards  Air  Force  Base);  the  task 
analysis  that  is  conducted  after  the  jobs  have  been  established  but 
before  occupational  survey  data  are  available;  the  task  analysis  that 
is  conducted  after  occupational  survey  data  are  available  but  before 
the  course  has  been  written;  and  finally,  the  task  analysis  that  is 
conducted  in  order  to  revise  an  existing  course. 

We  agree  with  the  various  experts  cited  in  Montemerlo  and  Harris 
(1978)  about  the  need  for  judgment  and  experience— as  opposed  to 
standardized  procedures— especially  for  the  three  or  four  kinds  of  task 
analysis  that  appear  early  on  this  list.    We  would  maintain,  however, 
that  both  the  feasibility  and  desirability  of  standardization  increases 
as  you  get  further  and  further  away  from  the  original  weapons  develop- 
ment process.    One  of  the  reasons  that  standardization  is  desirable  is 
that  the  personnel  who  conduct  the  task  analysis  ii  these  later  stages 
are  typically  not  professional  analysts.     There  are  some  hard  realities 
in  the  Department  of  Defense  budget  that  force  us  to  use  enlisted, 
subject  matter  experts  who  are  not  professionally  trained  as  system 
analysts  or  educators.    To  the  extent  that  such  personnel  are  used— the 
need  for  standardization  and  simplication  of  procedures  increases. 

Failure  to  make  distinctions  betv^een  these  various  types  of  task 
analysis  can  lead  to  a  lot  of  confusion.    It  is  not  unusual  for  someone 
to  attack  task  analysis  procedures  designed  for  revising  courses  because 
they  are  not  documented  with  the  kind  of  detail  required  for  weapons 
systems  design.    ISD  experts  sometimes  get  upset,  for  example,  because 
all  of  the  procedures  designed  for  brand  new  jobs  are  not  being  used  in 
the  revision  of  Air  Force  courses.    The  probability  of  this  reaction  is 
increased  by  the  fact  that  most  ISD  manuals  are  designed  for  new  courses 
rather  than  the  revision  of  existing  ones^    We  have  also  encountered  a 
similar  type  of  confusion  in  which  subject  matter  experts  are  asked  to 
do  jobs  that  they  are  not  qualified  to  do,  because  it  is  assumed  that  a 
person  who  can  do  one  type  of  task  analysis  is  qualified  to  do  other 
types  that  really  require  professional  training. 

Granted  that  some  kinds  of  task  analysis  do  require  expert  judgment 
by  professionally  trained  personnel  and  some  do  not,  the  question 
addressed  in  this  paper  is,  "To  what  extent  should  we  standardize 
procedures  for  those  kinds  of  task  analysis  that  are  currently  being 
conducted  by  subject  matter  experts  rather  than  by  professional  analysts 
In  other  words,  to  what  extent  should  we  standardize  procedures  for  the 
two  types  of  task  analysis  shown  in  the  last  two  rows  of  Table  K 

Incentives  for  Standardization 

There  are  many  incentives  for  standardization,  and  most  of  them 
are  relatively  obvious  (see  Table  2).    One  can  avoid  duplication  of 
effort,  facilitate  communication,  improve  the  amount  of  management 
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control,  provide  a  consistent  basis  for  evaluation,  help  inexperienced 
subject  matter  experts  to  benefit  from  the  experience  of  professional 
analysts,  and  so  on.    These  things  are  especially  important  in  the 
military  environment,  where  there  is  rapid  turnover  of  key  personnel 
and  rapid  technological  change  in  the  jobs. 


Table  2    Incentives  for  Standardization 

Minimize  duplication  of  effort 
Facilitate  communication 
Improve  management  control 
Provide  a  consistent  basis  for 

evaluation 
Provide  inexperienced  personnel 

with  useful  guidelines 
Facilitate  training  of  new 

task  analysts 


Obstacles  to  Standardization 

The  obstacles  to  standardization  are  perhaps  not  so  obvious  (see 
Table  3).    First,  let  us  deal  with  the  obstacles  to  DOD-wide  standardi- 
zation, across  Army,  Navy,  Air  Force,  and  Marine  Corps.    In  the  first 
place,  the  jobs  are  different.    At  one  extreme,  the  Navy  must  provide 
personnel  with  a  wide  diversity  of  qualifications  for  assignments  to 
small  ships.    The  Navy  consequently  has  very  broadly  defined  job  cate- 
gories, called  ratings.    At  the  other  extreme,  the  Air  Force,  which 
typically  has  large  installations  that  work  with  highly  specialized 
equipment,  has  very  technical  jobs  in  specific  job  categories  that  are 
more  narrowly  defined  than  the  Navy  ratings.    The  Army  and  Marine  Corps 
fall  in  between  the  extremes. 

Another  obstacle  to  standardization  is  that  the  various  task 
analysis  organizations  are  staffed  differently.    The  Navy  seems  to  have 
extremely  qualified  people  at  its  new  Instructional  Program  Development 
Centers.    The  Air  Force,  by  contrast,  has  had  to  react  to  budget  cuts 
by  repeatedly  decreasing  the  number  of  professionally  trained  personnel 
at  the  Air  Force  technical  training  centers.    The  Marine  Corps  has  the 
fewest  professionals  while  the  Army  is  probably  more  similar  to  the  Air 
Force  than  it  is  to  the  Navy. 

Another  obstacle  to  DOD-wide  standardization  is  that  the  occupa- 
tional survey  inputs  to  ISD  personnel  regarding  established  jobs  varies 
from  service  to  service.    All  services  do  use  occupational  survey  data, 
but  the  extent  to  which  these  data  are  analyzed  before  they  are  sub- 
mitted for  use  in  task  analysis  and  training  program  design  varies 
considerably  from  service  to  service.    The  Air  Force,  which  originally 
designed  the  occupational  survey  methods  used  today  (Christal,  1974), 
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Table  3   Obstacles  to  Standardization 
of  Task  Analysis  Procedures 

Obstacles  to  POD  Wide  Standardization 

Jobs  and  job  categories  are  different 
Qualifications  of  ISO  and  task  analysis  staffs 

are  different 
Inputs  from  occupational  survey  data  are  different 
User  orientation  of  occupational  measurement  centers 

is  different 

Obstacles  to  Standardization  Within  a  Single  Service 

Different  requirements  for  task  analysis  associated 

with  combat  crew  training 
Different  resources  and  professional  expertise 

Unresolved  Issues  Regarding  the  Design  of  a  Task  Analysis  Manual 

Task  analysis  for  training  versus  task  analysis  for 

multiple  users 
Task  analysis  documentation  that  is  of  marginal  utility 


has  taken  a  great  deal  of  interest  in  occupational  survey  data,  and  the 
data  that  are  provided  to  Air  Force  training  centers  are  rapidly  growing 
more  sophisticated  and  better  organized.    The  data  provided  by  the  Air 
Force  are  also  much  more  detailed  than  that  provided  by  the  other 
services.    This  is  partly  a  function  of  the  way  in  which  the  jobs  are 
defined.    If  the  job  categories  are  relatively  specific,  as  is  true  of 
Air  Force  career  ladders,  it  is  possible  for  the  occupational  survey 
information  at  the  task  level  to  also  be  specific.    The  Navy,  since  it 
uses  broadly  defined  job  categories,  almost  necessarily  uses  tasK 
statements  that  are  broadly  defined.    If  they  don't  do  so,  the  task 
inventories  will  be  too  long,  and  there  will  be  problems  with  the 
quality  of  the  data. 

Other  differences  between  the  services  are  associated  with  the  way 
in  which  the  occupational  measurement  centers  are  organized.    In  the 
Air  Force,  the  Occupational  Measurement  Center  is  part  of  the  Air- 
Training  Command,  and  training  applications  are  given  extremely  high 
priority.    In  the  other  services,  the  occupational  measurement  center 
is  part  of  a  Military  Personnel  Center,  and  other  uses  of  occupatiofial 
survey  data  (e.g.,  classification,  job  satisfaction  studies,  etc.)  have 
high  priority,  while  training  applications  seem  to  have  less  priority. 

Another  set  of  obstacles  to  standardization  exists  within  each 
service.    For  example,  in  the  Air  Force  task  analysis  is  conducted  at 
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Combat  Crew  Training  Schools  (each  of  which  is  associated  with  a  major 
command)  as  well  as  at  Air  Force-wide  Technical  Training  Centers  (which 
are  part  of  Air  Training  Command).    These  two  parts  of  the  Air  Force 
typically  follow  different  task  analysis  procedures,  and  typically 
approach  the  problem  in  different  ways.    The  Combat  Crew  Training 
Schools  (CCTSs)  conduct  a  very  sophisticated  kind  of  task  analysis 
since  they  must  deal  with  teams  of  personnel  rather  than  individuals. 
These  teams  of  personnel,  moreover,  are  involved  in  complex  combat 
scenarios  with  multiple  weapon  units  and  multiple  delivery  systems. 
The  needs  of  the  two  types  of  Air  Force  organizations  are  so  different 
that  there  are  two  sets  of  complaints  about  the  Interservice  ISD  manual  • 
(Interservice  Committee  for  Instructional  Systems  Development,  1975). 
Some  of  the  ISD  personnel  at  the  CCTSs  complain  that  these  procedures 
are  too  simple.    The  ISD  personnel  at  the  Technical  Training  Centers 
complain  that  the  same  interservice  procedures  are  too  complex. 

The  need  for  additional  complexity  in  CCTSs  is  illustrated  by  a 
case  in  which  the  tasks  associated  with  a  combat  attack  plane  were 
analyzed  three  times.    First  the  standard  interservice  ISD  procedures 
were  used.    Unfortunately,  these  interservice  task  analysis  procedures 
were  deemed  inadequate  because  there  was  not  enough  emphasis  upon 
performance  objectives.    The  whole  task  analysis  was  redone  using 
Mager's  performance  objectives  (Mager  and  Pipe,  1976)  which  seemed  to 
help,  but  this,  too,  was  not  satisfactory.    The  task  analysis  was 
redone  again  using  combat  team  descriptions  of  the  tasks  as  part  of^ 
complex  combat  scenarios.    The  reaction  to  the  interservice  manual  is 
exactly  the  opposite  at  ATC  technical  training  centers  where  it  is 
simply  considered  too  complex.    At  ATC  schools,  the  interservice  manual 
is  primarily  used  for  guidance  in  the  design  of  more  simplied  procedures 
for  local  use. 

The  amount  of  resources  and  expertise  available  for  task  analysis 
also  differ  sharply  between  Technical  Training  Centers  and  Combat  Crew 
Training  Schools.    The  CCTSs  often  use  contractors,  whereas  the  Techni- 
cal Training  Centers  tend  to  use  military  subject  matter  experts  in 
each  specialty.    The  qualifications  of  the  staff  assembled  by  a  con- 
tractor organization  tend  to  be  of  very  good  quality.    They  also  tend 
to  be  very  expensive. 

Another  set  of  obstacles  exists  because  of  unresolved  issues 
regarding  the  design  of  a  task  analysis  manual.    To  what  extent  should 
a  training  task  analysis  provide  information  for  multiple  users  of  task 
analysis  information?    To  what  extent,  for  example,  should  the  perform- 
ance objectives  designed  for  training  purposes  be  useful  to  the  people 
who  establish  performance  objectives  for  promotion?   Another  important 
unresolved  issue  is  the  question  of  how  much  documentation  of  task 
analysis  procedures  is  really  needed.    Suppose  that  you  have  in  your 
hand  a  detailed  Plan  of  Instruction  (POI)  containing  behavioral  objec- 
tives for  each  block  of  instruction.    Do  you  still  need  a  lot  of  task 
analysis  documentation  to  back  up  that  POI,  or  can  it  be  argued  that 
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the  end  product  is  all  that  is  really  required?   If  one  is  in  a  military 
training  command,  it  is  easy  to  argue  that  the  task  analysis  documenta- 
tion for  purposes  other  than  training  is  not  needed,  or— if  it  is 
needed— that  it  should  come  out  of  somebody  else's  budget  rather  than 
your  own.    Consideration  must  also  be  given  to  documentation  that  is  of 
marginal  utility.    Granted  that  some  documentation  is  essential,  one 
can  argue  that  the  value  provided  by  additional  amounts  of  documentation 
is  less  and  less  until -•-eventually— the  additional  documentation  seems 
to  be  more  trouble  than  it  is  worth. 

The  Case  for  Non-Standardized  Task  Analysis  Procedures 

As  long  as  the  various  military  services  are  staffed  differently, 
organized  differently,  have  different  kinds  of  input  information,  and 
differing  amounts  of  professional  expertise  and  differing  customer 
orientations,  a  very  good  case  can  be  made  for  permitting  each  service 
to  have  its  own  task  analysis  procedures.    This  does  not  mean  that  the 
various  services  cannot  derive  mutual  benefit  from  improvements  in  task 
analysis  methodology  or  standardized  formats  for  shared  information. 
In  our  current  work  on  the  design  of  an  Air  Force  task  analysis  manual, 
we  have  already  contacted  people  from  Air  Force  Combat  Crew  Training 
Centers  as  well  as  people  in  Army,  Navy,  Marine  Corps,  and  Coast  Guard. 
They  will  be  invited  to  help  themselves  to  any  of  our  methods  that  look 
useful  to  them.    We  are  also  open  to  suggestions  for  standardized 
formats  for  sharing  information  about  the  results  of  task  analysis 
efforts. 

The  Case  for  a  Standardized  Task  Analysis  Data  Bank 

One  can  make  a  much  stronger  case  for  a  standardized  task  analysis 
data  bank  than  we  can  for  a  task  analysis  manual  (see  Table  4).  There 
are  many  jobs  in  the  military  services  (e.g.,  plumbers,  carpenters, 
machinists)  that  are  so  similar  that  it  would  be  a  tremendous  waste  of 
effort  if  all  services  were  to  conduct  independent  task  analyses  of 
their  own.    Yet  that  is  exactly  what  has  happened  and  is  currently 
happening  at  this  time.    The  job  of  carpenter,  for  example,  is  analyzed 
by  all  military  services.    This  is  certainly  not  the  way  in  which  task 
analyses  are  handled  at  the  Vocational  Technical  Education  Consortium 
(VTEC)  of  Southern  States  (Hirst,  1975).    In  this  consortium,  the  task 
analysis  work  is  divided  so  that  each  state  only  does  its  proportionate 
share  of  the  task  analysis  work  in  areas  of  general  interest.  Georgia, 
for  example,  may  analyze  the  job  of  carpenter  and  Mississippi  may 
iinalyze  the  job  of  plumber.    There  are  also  many  advantages  involved  in 
having  analysis  information  available  on  computer.    For  example,  the 
corfrputer  can  generate  field  survey  sheets  that  can  be  used  for  valida- 
tion studies  of  the  task  analysis  worksheets. 

There  are  problems  with  standardized  data  bank  proposals,  however. 
While  it  is  true  that  a  task  analysis  data  bank  is  highly  cost  effective 
if  one  is  starting  out  from  scratch  to  develop  task  analysis  information 
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Table  4   The  Case  for  a  DOD-wide  Computerized 


Task  Analysis  Data  Bank 


Incentives 

Minimize  duplication  of  effort 

Document  previous  efforts  that  have  not  been  well  documented 
Generate  work  center  catalogs  for  OJT 
Rapid  updating 

Facilitate  sharing  of  information 
Computer  assembled  forms  for  field  surveys 

Obstacles 

Uncertainty  as  to  whether  documentation  is  really  needed 
Uncertainty  as  to  cost  effectiveness 
Prior  need  for  agreement  on  a  standardized  format  for 
information  to  be  shared 


for  all  jobs  in  the  Department  of  Defense,  that  is  not  the  situation  in 
which  we  find  ourselves.    Almost  all  DOD  jobs  have  already  been  analyzed, 
and  Plans  of  Instruction  with  behavioral  objectives  already  exist. 
Under  such  circumstances,  a  DOD-wide  task  analysis  data  bank  is  not  a 
way  of  avoiding  work,  it  is  a  requirement  to  do  work  that  would  not 
otherwise  be  accomplished  at  all.    One  could  conceivably  argue  that 
undocumented  work  is  work  of  poor  quality,  and  that  a  standardized  task 
analysis  data  bank  should  be  required  in  order  to  upgrade  the  quality 
of  the  information.    We  don't  really  know  whether  this  is  true  or  not, 
however,  since  we  do  not  have  acceptable  measures  of  the  task  analysis 
information  that  is  presently  on  file,  nor  do  we  have  good  information 
about  the  cost  advantages  of  redoing  the  work  if  it  is  of  poor  quality. 

One  thing  is  certain--if  one  already  has  end  products  in  the  forms  of 
behavioral  objectives  for  Plans  of  Instruction  (POIs)--it  is  very 
difficult  to  convince  training  executives  that  they  should  undertake  a 
massive  documentation  effort  for  task  analysis  data.    The  training 
commands  already  have  the  POIs  that  such  an  effort  would  produce  and  it 
is  the  POIs— especially  POIs  that  are  provided  with  behavioral  objectives 
--in  which  they  are  most  interested. 

We  conclude  that  the  disadvantages  of  standardizing  task  analysis 
procedures  outweigh  the  advantages  (see  Table  5).    This  does  not  mean 
that  all  forms  of  standardization  are  undesirable.    We  at  HRL,  for 
example,  are  considering  plans  for  a  task  analysis  data  bank  for 
critical  tasks  that  are  scheduled  for  On-the-Job  Training  (OJT).  The 
objective  is  to  provide  each  work  center  with  a  catalog  containing 
information  about  the  performance  standards,  the  steps  to  be  followed 
in  accomplishing  the  task,  the  relevant  Technical  Order  references,  and 


195        p  y  , 


so  on—for  all  the  critical  tasks  in  a  particular  work  center.  In 
developing  this  task  analysis  data  bank,  we  do  not,  however,  propose 
to  re-do  the  existing  task  analyses  for  every  job  in  the  Air  Force. 
The  Air  Force  cannot  afford  to  do  such  a  thing  even  if  we  wanted  it 
to—which  we  do  not. 

Granted  that  we  should  not  attempt  to  establish  a  task  analysis 
data  bank  all  at  once,  it  is  still  possible  to  establish  such  a  data 
bank  over  a  period  of  ten  or  twenty  years  by  standardizing  all  new 
task  analysis  documentation.    This  focus  upon  new  documentation  would 
still  permit  us  to  divide  up  the  responsibilities  for  documentation 
among  the  various  military  services,  so  as  to  minimize  duplication  of 


Table  5  Conclusions 


Question  Answer 

Should  task  analysis  proce- 
dures be  standardized?  No 


Should  information  about  task 

analysis  procedures  be 

shared?  Yes 

Should  task  analysis  data 

for  all  courses  be  redone  in 

a  standardized  format?  No 


Should  a  task  analysis  data 
bank  be  generated  to  facilitate 
OJT  in  critical  tasks?  Yes 


Should  standardized  output 

be  required  when  new  task 

analysis  efforts  are 

documented?  Yes 


Comment 

Current  differences  in  jobs, 
job  categories,  staff  qualifi' 
cations,  occupational  survey 
inputs,  and  user  orientation 
are  too  great. 

Communications  are  excellent 
in  this  respect. 


Even  though  documentation  may 
be  of  poor  quality  and  incon- 
sistent from  service  to  service ^ 
the  task  analyses  have 
already  been  conducted  and 
courses  have  already  been 
designed.    Work  would  have  to 
be  redone. 

Plans  to  provide  computer- 
assembled  work  center  catalogs 
containing  task  analysis  data 
for  use  in  OJT  are  currently 
being  prepared. 

An  interservice  group  of 
representatives  should  be 
asked  to  consider  procedures 
for  sharing  new  task  analysis 
data. 


effort.    It  is  a  safe   assumption  that  one  would--ten  or  twenty  years 
from  now— have  documentation  that  would  be  of  much  higher  quality  than 
it  is  today.    An  interservice  group  of  advisors  should  be  asked  to 
consider  this  possibility  at  a  conference  to  be  scheduled  early  next 
year. 

As  mentioned  earlier  in  this  paper,  many  of  these  questions 
require  information  about  the  importance  of  good  quality  documentation. 
The  need  for  documentation  is  complicated  by  the  fact  that  many  oppor- 
tunities exist  for  quick  fixes  downstream.    Since  many  other  procedures 
can  also  improve  the  quality  of  training,  how  important  is  the  initial 
task  analysis?   We  know,  for  example,  that  effective  use  of  occupational 
survey  and  task  training  emphasis  data  can  keep  us  from  overtraining-- 
we  also  know  that  complaints  from  the  field  can  keep  us  from  under- 
training.  So  we  have  two  feedback  loops  that  will  gradually  improve 
our  courses  over  a  period  of  several  years  regardless  of  the  quality 
of  the  initial  task  analysis. 

Those  who  recommend  standardization  of  documentation  would  be 
completely  correct  if  every  service  were  starting  out  fresh  to  conduct 
task  analyses  for  every  occupation.    But  they  are  not.    On  the  contrary, 
the  typical  task  analysis  requirement  nowadays  involves  a  minor  scrub- 
down  of  an  existing  course  that  is  already  well  defined  in  terms  of 
behavioral  objectives.    The  task  analysis  documentation  may  be  non- 
detailed  or  even  non-existent--but  we  don't  really  know  how  important 
a  lack  of  documentation  really  is. 

One  of  the  reasons  that  large  quantities  of  documentation  seem  so 
attractive  to  many  people  is  that  they  tend  to  think  in  terms  of  task 
analysis  for  systems  design  or  task  analysis  for  new  courses.  People 
tend  to  assume  that  if  a  large  amount  of  documentation  is  good  for  the 
human  eningeers  and  the  systems  designersj  then  it  must  be  equally 
good  for  the  trainers.    In  actual  fact,  a  similar  amount  of  documentation 
may  or  may  not  be  needed  for  the  trainers,  but  we  should  at  least 
recognize  the  price  tag.    If  it  is  needed—we  are  going  to  have  to 
reaccomplish  hundreds  of  manyears  of  work  for  which  the  responsible 
training  organizations  do  not  have  adequate  resources.    This  can  be 
redone  all  at  once,  with  a  lot  of  duplication  of  effort--or  it  can  be 
redone  gradually  over  a  period  of  many  years  as  part  of  the  normal 
updating  function. 

The  only  way  to  really  resolve  this  is'iue  is  to  collect  systematic 
evidence  regarding  the  cost  effectiveness  of  standardized  documentation. 
We  can  certainly  support  the  need  for  this  documentation  without 
equivocation.    However,  until  the  evidence  has  been  collected  and  the 
cost  determinations  have  been  made,  proposals  for  high  priority  stan- 
dardization of  task  analysis  documentation  are  more  than  just  a  little 
bit  late.    They  are  proposals  for  large  expenditures  of  time  and 
effort  without  any  systematic  evidence  that  these  expenditures  are 
really  worthwhile. 


197 


References 

rhri«;tal    RE.    The  United  States  Air  Force  occupational  research 

project     A™::rTR--73-75.  AD-774-b/4     Lackland  AhB.  Ia:  Uccupa- 
tional  Research  Division,  January  1974. 

Hirst,  B.S.,  Jr.    The  instructional  systems  model  of  the  Vocational- 
Technical  Education  Consortium  of  States  used  to  develop  performance 
objectives,  criterion-referenced  measures  and  performance  guides 
for  learners.    In  P.E.  Schroeder  (Ed.),  Proceedings  of  a  symposium 
on  f^^\c  analysis/task  inventories  (UN  Series  No.  I U}.  Columbus: 
The  Ohio  State  University,  I  he  Center  for  Vocational  Education, 
1975. 

Interservice  Committee  for  Instructional  Systems  Development  IntfJ;. 
service  PrnrPdures  for  Instructional  Systems  Development  (NAVEDTRA 
106A,  five  volumes),    hort  Benning,  GA:    U.S.  Army  Combat  Arms 
Training  Board,  August  1975. 

Mager,  Robert  F.  and  Pipe,  Peter.    Criterion  referenced  instruction: 
^    Analy.i..  design  and  implementation.  Participant  Manuals  revised 
Edition).    Los  Altos  Hills,  CA:    Mager  Associates,  Inc.,  1976. 

Montemerlo,  Melvin  D.  and  Harris,  Ward  A.    Angels,  pinheads,  and  task 
analysis.    Paper  presented  at  the  1978  Annual  Convention  of  the 
American  Psychological  Association,  Toronto,  Canada,  bep  i,  iy/«. 


198 


TASK  ANALYSIS:    DESTINATION  OR  JCURHEY 


Dr.  Melvin  D.  Montenarlo 
Dr.  Frank  M.  Aversano 
U    S.  Anny  Training  Support  Center 
Ft.  Euatis,  VA  23604 


The  systems  Approach  to  Training  (SAT)    as  it  was  known  in  the 
1960's,  or  instructional  Systems  Development  f^^^^^^ J^^^.'^.^.hnology. 
hAQ  become  the  prc-cTfiinent  concept  ot  moaern  instLuu.x  nuar- 
M^re'^hriOO  SAT/ISD  manuals  l.ve  been  publishe  during^h^^^^^^^ 
ter  century  (Montemcrlo  and  Tennyson.  19/6).    Each  °t 
breaks  down  the  process  of  course  development  into  a  JJ""       ^  ^^.^^ 

steps  designed  to  be  carried  out  b^/^^^^''  fj^S^^i"^  ?;") .  Although 
little  or  no  background  in  instructional  design  (Klein,  -^'JJ  • 

accomplished  they  all  agree  °!;  ""^^J^^^^i^^J^.'p^^ceduralizable -(that 
f  "tS  u^a^b  -reduced' to'a^euSeSnel  prc-3tated  .e,ua„ce  of 

:  ."ho::  S^T/ISD  ™a„u.l=  PJ-^^-'oreaS'^f'^hr^  S  TJL 
than  the  filling  out  of  a  pre-designed  fonr.  on  each  ot  tne 


trained. 


The  sAT/isu  v.e„  that  ta..  -j^^^=,i:,^/rh"pi:c'dre°^ri%!:::iy 

^  5?  tS^TaS.;-;;  H  t4ht    and  Which  an  ^;^^fi^ 
"r£.rtu==tcly    ta^k  ana  y=l    has  c»,c  to  he  v..^^^^  ^^^^^ 

nation,  and  not  as  a  jjumey.     "  {■  ^       ^ask  analysis,  the 

the  latter  analosy  U  r.ore  appropriate.     In  a.y  Eiv^n  ,v„ilablci 

Ind  tie  co-operativeness  of  the  people  being  analysed. 

Task  analysis       a  purely  rational  process  (in 
osophy  is  a  purcxy  r.ticnal  process  ^^f^^^^  of  pure  rrJtional 
.ethod  ca.e  about  a.  a  "cogn  t io.  of      e  i-^-ion^of  ^p^^_^  y  , 

processes.  At  .irst  p^^'-^^'  "  ,  ^t  r:.cks  fall  faster  than  feathers, 
faster  than  light  .f      ^^^^i/r^^w  out  the  amchair;  it  merely 

To.  scientific  -ve^ent  dxdn  t  re.l  y  .^^^^hed  thinking,  and 

asked  the  pcr.^on  sitting  in  it  to  f-^^  "    ,  ^^^h  a  100%  re- 

test  out  his  conclusion,    f^k  analysis  ha.  been  Prf^  „ 

rererrit-Su^^^^^^^^^ 

•  K    ,-«c-iH^r  rho  spr  of  all  possible  task  analyses  of  that 
,oh.  'Z^Z^'^i:^^        hi.  posslhi.  ta..  analysis  «as  In  hand 
until  th:  entire  universe  of  task  analyses  on  that  Job  »as  done.  But 
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tralnlne  program,  one  """W^,"^'^.  "  ^^^.jr^he  relative  goodness  of  the 
"^I'if  ^'l^^raL  of  f tLt^thfanalysl.  vhlch  „a.  the  easier 


develop  a    Li.axiixi»&  r-^o- 

resulting  courses.  In  case  of  a  ti 
to  use  would  be  the  winner. 


Any  one  who  douhts  that  there  would  ^--^/.".rird^o^Tittlf ' 
yses  which  are  in'^^P^^'^^^^^J  '°"%:y^:nd  h^ve  any  two  people  analyze 
experiment.  Select  a  very  ^J^Pj^'^^^^/people  are  even  better.)  While 
It  independently.     (Three,  four  %P^°P„„temerlo  participated  in 

in  his  doctoral  program  at  Penn  J^'  "  f gg^^rs  (in  an  instructional 

such  an  exercise.    One  Friday    one  f  .^^^^/^'j^'ftask  analysis  of  a 
technology  course)  asked  the  five  ^J^^ents  to  do  a  t^  ^^^^y^^^ 
relatively  straightforward  mathematical  task,    ine  ^  ^^^^ 

Monday  with  documents  ranging  in  l-g^^    'Torn  a  ^ew^p^g^  ^^^^^^^^  ^^^^ 
volume.    The  professor  asked  ^^^e  students  to  p  ^^^^^  ^^^^^^^ 

by  each  of  the  other  .^'^^e  them.    The  dismayed  students, 

of  their  analyses;  he  didn  ^^^ed  you  to  do  a  task  anal- 

«era  eiven  the  following  explanation.      ^  ^f^^.^^^  ^^^pose  your  task 
ysis.-  I  gave  you  the  task    but  J  ^  ,an' t  do  a  mean- 

analysis  was  to  serve.  ,iear  and  his  medium  was  effec- 

mgful  task  analysis.  ^^"^^         ^.^jents  the  goal  of  the  anal- 

trer:o;irrave^ren"llrg"diff:rences  in  the  result. 

Doing  a  task  analysis  is  much  l^^e  -king  a  projecti^^ 
There  is  Monumental  roo.  for  variance  in  J^e  "sults^^  ^^^^ 
for  this  is  provided  by  John  Holt  (1976; 
of  Education."    He  states, 

^1      1        T,nrH«    to  say  that  anyone  doing 
"It  may  be  true,  at  the  level  of  «°rds    to  s  Y  ^^^^ 

a  difficult  thing  well  is  using  a         .^J^,"^ JH,,' to  break  it 

not  mean  that  the  best         5°  ^^^f ^^^^Jble  a^d  teach  them  one  by  one." 

down  to  as  many  separate  skills  as  possible  an 

.  .-i,^  ar^^f^clalitv  that  necessarily  exists 
Holt  is  pointing  out         ^"^Jf  ^^^JJ^^ous,  is  sliced  into  discrete 
whenever  human  performance,  which  is  continu 

"tasks." 

Tj    o^^r^V>Iltes  it  to  Alfred  North 
Holt's  hypothesis  is  not  new     He  ^  "^^^"J^^./^,  r.  b.  Miller, 
Whitehead.    Another  great  educator  "^o  espoused  l 
the  father  of  modern  task  analysis.    Mxller  (19bb; 

"A  task  is  a  fairly  arbitrarily  .^'^^^^^  J^// ^J^iJf  n;t)\e 

rigorou.  operational  definition  cannot  (and  therefore 
sought.     It  is  a  heuristic  term. 

At  the  Mr  Force's^ISD  Conference  (The  Pentagon^  rescribed'li^"" 
-rm^t^iLf  ^fls/Xict  rcLS-:  ^nUrtLle.    Ml  bave  to  do 
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with  the  task  analysis  portion  of  ISO.    They  are,  that: 

1)  any  task  can  be  reduced  to  a  series  of  stimuli  and  responses, 

2)  the  resulting  task  breakdown  is  the  best  way  to  teach  the  task, 

3)  the  personnel  who  can  do  a  task  best  can  do  the  best  analysis, 

4)  a  whole  task  Is  nothing  more  than  the  sum  of  its  parts, 

5)  defining  "successful  performance"  of  a  task  is  straightforward. 


and. 


6)    complexity  always  yields  to  successful  analysis, 


Montemerlo  (1976)  added  four  more  unsupportable  assumptions  to  Cream's 
list, 

7)  task  analysis  is  a  non-political  process, 

8)  the  process  requires  no  creativity. 

9)  there  is  one  best  way  to  teach  any  task. 

10)    one  method  of  analysis  is  best  for  all  tasks. 

The  hypothesis  that  tasks  can  be  broken  down  into  a  linear  sequence 
of  stimuli  and  responses  has  been  highly  attractive  for  some  time.  It 
was  one  of  the  foundations  of  the  behavioral  or  "S-R"  school  of  psychol- 
ogy.   The  hypothesis  that  tasks  can't  be  broken  down  has  also  been  highly 
attractive  for  about  the  same  length  of  time.    It  was  one  of  the  founda- 
tions of  the  cognitive  school  of  psychology  and  was  popularized  by  the 
Gestaltists.    Reading  up  on  the  history  of  these  two  schools  may  provide 
some  insight  into  the  future  of  task  analysis.    In  short,  the  two  schools 
melded.    The  pure  S-R  approach  was  soon  given  up  as  untenable  and  the  pure 
cognitive  approach  was  given  up  as  not  very  usef u"^ . 

The  S-R  people  inserted  an  "0"  between  the  S  and  the  R.    The  "0" 
stood  for  ''organism",  and  was  inserted  in  recognition  of  the  fact  that 
organisms  process  information  coming  from  the  stimulus  before  reacting 
to  it.    The  fact  that  this  information  processing  exists,  is  obvious, 
but  the  fact  that  it  is  not  open  to  direct  observation  is  just  as  obvious. 
"Intervening  variables"  and  "hypothetical  constructs"  were  hypothesized 
to  account  forwhat  goes  on  during  this  processing.    The  neo-behaviorists 
went  so  far  as  to  hypothesize  countless  undetectable  little  s  s  and  r  s 
(called  kinesthetic  and  proprioceptive  cues)  which  occurred  between  each 
external  stimulis,  S,  and  each  response,R.    With  that  development,  the 
behaviorists  and  the  cognitivists  came  to  complete  agreement:  objective 
analysis  and  scientific  studies  combined  could  not  describe  human  behavior 
as  a  series  of  stimuli  and  responses. 

The  most  significant  step  forward  in  task  analyses  in  recent  years 
can  be  found  in  Klein's  1977  paper  entitled,  "Phenoraenological  Approach 
to  Training,"    He  gives  a  brilliant  and  forceful  argument  for • recognizing 
that  rational  task  analysis  has  its  limitations.    He  finds  ISD-type 
approaches  to  task  analysis  suitable  for  procedural  tasks  but  not  for 
affective  skills  or  for  complex  perceptual  and  motor  tasks.    He  finds  it 
useful  for  describing  the  relatively  choppy    performance  found  at  initial 
stages  of  learning  but  not  for  describing  the  smooth,  highly  proficient 
performance  of  experts.    Klein  states: 
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"Instructor  pllotr,  working  on  I3D  teams  are  frequently  charged  with 
developing  ISD  descriptions  of  complex  performance.    They  prepare  such 
descriptions,  but  x^ill  typically  admit,  on  an  informal  basis,  that  they 
do  not  follow  those  ISD  steps  while  flying." 

Klein's  major  argument  is  that  as  a  person  increases  in  skill  on  a 
task,  he  experiences  shifts  in  perspective  concerning  the  task.    G.  A. 
Miller  (1956)  had  introduced  this  idea  originally  under  the  rubric  of 
"chunking."    What  is  important  to  the  novice  is  often  subsumed  into 
larger  chunks  of  behavior  and  is  no  longer  consciously  thought  about.  The 
novice  billiards  player,  for  instance,  worries  only  about  making  his  next 
shot,  while  the  expert  is  thinking  many  shots  in  advance. 

Holt  (1976)  makes  a  similar  point.    He  feels  that  analyzing  behavior 
into  tasks  is  artificial  in  that  expert  behavior  is  not  a  unitary  concept. 
That  is,  he  believes  that  one  does  not  stop  "learning"  at  some  point  in 
time  and  then  start  "doing".    Great  musicians,  such  as  Van  Cliburn  and 
Earl  Scruggs  didn't  become  experts  at  some  point  and  then  level  off.  'To 
Holt,  "learning"  is  "doing"  and  both  continue  until  you  die.    ISD  methods 
generally  recognize  only  two  levels  of  performance,  unacceptable  and 
acceptable.    Anyone  who  has  ever  done  a  task  analysis  and  attempted  to 
come    up  with  the  "acceptable"  standards  of  performance  knows  the  frustra- 
tion involved  and  can  appreciate  Holt's  argument. 

Instructional  technologists  in  the  Navy  know  that  there  is  an  "East 
Coast  Navy"  and  a  "West  Coast  Navy".    Those  in  the  Air  Force  know  that 
there  is  a  school  way  of  flying  and  many  different  operational  ways. 
Those  in  the  Army  know  that  personnel  holding  the  same  MOS  (Military 
Occupational  Specialty)  may  have  little  overlap  in  the  tasks  they  perform. 
The  bottom  line  is  that  the  real  world  of  task  analysis  is  much  more  complex 
than  it  appears  in  the  ISD  literature. 

The  recommendation  of  this  paper  is  not  to  stop  doing  task  analysis 
just  because  it  has  difficulties,  but  to  realize  that  it  is  a  journey  and 
not  a  destination.    As  with  any  journey,  you  should  begin  a  task  analysis 
only  after  you: 

1)  know  where  you  intend  to  go. 

2)  are  willing  to  pay  the  price. 

3)  are  prepared  for  emergencies  and  changes  in  venue. 

4)  have  someone  along  who  knows  the  way. 

5)  are  prepared  to  stop  every  so  often  to  assess  your  progress. 

6)  realize  that  someone  is  waiting  for  you. 

7)  are  ready  to  enjoy  your  trip. 

Task  analysis  may  be  compared  to  a  specific  type  of  journey— a 
pioneering,  exploratory  voyage.    You  can  be  sure  that: 

1)  you  will  be  criticized  for  going. 

2)  you  will  be  criticized  for  the  route  you  take. 

3)  you  will  find  the  road  rocky  at  points. 

4)  there  will  be  dissention  in  the  ranks  somewhere  along  the  line. 


5)  there  will  be  some  stressful  times  and  some  heartrending  decisions. 

6)  someone  who  takes  the  same  journey  after  you  will  surely  take  a 
better  route,  especially  if  he  has  your  trip  report  to  work  with. 
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As  pressures  to  lengthen  occupational  surveys  grow,  four  fundamental 
criteria  for  developiLg  task  inventories  become  increasingly  important. 
These  essential  criteria  are  (1)  each  task  of  the  inventory  must  be 
time-ratable,  (2)  each  task  must  communicate  in  the  language  of  the 
specialty,  (3)  each  task  is  mutually  exclusive  of  other  tasks  in  the 
inventory,  and  (4)  each  task  must  differentiate  among  workers  where 
actual  task  performance  differs.    Besides  the  communicative  interpersonal, 
and  judgemental  skills  necessary  to  elicit  job  information  from  career 
field  incumbents,  describing  the  tasks  that  make  up  an  occupational 
specialty  is  a  blend  and  compromise  of  these  four  fundamental  criteria 
for  task  writing. 

THE  PROBLEM: 
lENGTH  VERSUS  DESCRIPTIVENESS 

Theoretically,  3ach  occupational  specialty  should  be  described  at 
the  lowest  level  of  work  activity,  with  activities  (tashs)  describing  a 
complete  and  inseparable  operation.     Often,  however,  occupational  areas 
are  so  broad  that  task  description  at  the  lowest  level  xs  impractical, 
because  inventory  length  makes  job  incumbent  response  ^ 
unreasonable.    Thus,  task  inventory  development  is  a  matter  of  compromise 
between  reasonable  task  list  length  and  the  writing  of  tasks  that  adhere 
to  the  four  fundamental  criteria. 

From  the  practical  viewpoint,  a  task  inventory  for  an  occupational 
specialty  is  no  more  than  a  sample  of  the  infinite  number  of  activities 
available  for  descriptive  purposes.    The  desirable  '^^'^'I'^K^ZlTAlv 
captures  the  essence,  or  intrinsic  nature,  of  the  occupational  specialty. 
H  consists  of  a  comprehensive,  yet  representative  set  of  activities  for 
each  subarea  of  the  specialty.    Task  inventories  rarely  include  all 
activities  comprising  an  occupational  specialty    not  only  because  of  the 
practical  constraint  of  inventory  length,  but  also  because  of  the  infinite 
number  of  ways  that  a  specialty  may  be  described, 

205 


The  problem  of  task  list  development,  then,  is  to  oescribe  the 
essence  of  an  occupational  specialty  with  a  sample  of  tasks  written  at 
the  lowest  level  of  specificity  consistent  with  the  constraints  of 
length,  the  fundamental  criteria  that  each  task  must  meet,  and  the 
purposes  which  the  ultimate  job  analysis  is  to  serve.    Regarding  the 
purpose  of  the  survey,  the  results  of  occupational  analysis  may  be 
employed  for  a  variety  of  personnel  management  purposes.    Some  of  these 
purposes  may  be  adequately  served  with  task  inventories  at  a  general 
level  of  specificity,  while  others  demand  greater  specificity.  Yet, 
United  States  Air  Force  experience  suggests  that  more  detailed  task 
lists  are  most  productive. 


EVOLUTION  OF  TASK  INVENTORIES  IN  THE  USAF 


Over  the  past  11  years  of  operational  occupational  analysis  in  the 
US  Air  Force,  more  than  300  occupational  areas  have  been  analyzed  and 
described.    This  effort  required  che  writing  of  over  150,000  tasks  which 
were  administered  to  over  700,000  job  incumbents. 

In  the  early  years  of  the  operational  experience,  task  inventories 
followed  the  model  established  through  ten  years  of  research  of  occupa- 
tional analysis  techniques.    Since  a  major  objective,  if  not  the  primary 
one,  of  early  task  lists  developed  during  the  research  period  was  to 
support  the  Air  Force  classification  process,  the  early  task  lists 
tended  to  be  less  detailed.    The  average  number  of  tasks  was  about  350, 
and  rarely  did  a  list  exceed  500  tasks.    The  broadly-written  tasks 
contained  in  these  inventories  provided  information  about  the  subdivisions 
of  the  specialty  upon  which  classification  decisions  could  be  made. 

Other  users  of  tlie  data  soon  developed,  primarily  training  managers 
and  curriculum  development  personnel.    These  users  began  to  request  more 
detail.    Presently  task  lists  for  the  simpler  specialties  range  from 
350  to  600  tasks;  for  the  more  complex  specialties  they  may  average  as 
many  as  1000-1200  tasks.    The  longest  USAF  inventory,  used  to  describe 
the  variety  of  jobs  in  the  Communication-Electronics  Officer  Utilization 
Field,  contained  1,435  tasks.    So  far,  there  is  no  evidence  that  these 
longer  inventories,  if  carefully  constructed,  have  any  deleterious 
effects  on  the  stability  of  incumbent  responses. 

These  longer,  more  detailed  inventories  provide  more  complete 
information  for  training  decisions  and  at  the  same  time  provide  specific 
information  for  later  users  of  the  data.    These  most  recent  users  include: 
promotion  testing;  management  engineering;  maintenance  engineering;  and 
personnel  research  into  such  areas  as  aptitude  requirements,  job  satis- 
faction, and  job  difficulty. 

Since  the  goal  of  these  longer  ir^'entories  is  more  precise  data,  it 
is  essential  to  apply  the  four  fundamental  criteria  for  developing  task 
inventories . 


FUNDAMENTALS  OF  TASK  D£vT.LOP.Tc-NT 


In  1967  Air  Force  Human  Resources  Laooratory  Technical  Report  PRL- 
TR-67-11,  Morsh  and  Aicher  published  a  Procedural  Guide  for  Conducting 
Occupational  Surveys  in  the  United  States  Air  force.    This  guide  remains 
the  single  best  source  of  task  writing  procedure?,  and  the  criteria 
described  below  are  readily  fcund  in  that  source.    This  paper  intends  zo 
elaborate  the  criteria  in  lighw  of  the  i50>0CJ  plds  tasks  that  have  been 
written  during  the  intervening  years. 

In  developing  tasks  to  de;>cribe  an  occupational  fields  the  occupa- 
tional analyst  is  charged  with  v/ritlng  tasks  that  meet  the  following 
fundamental  criteria: 

a.  Each  task  is  time-ratable       that  the  Job  incumbent  can 
reasonably  estimate  the  relative  amount  of  time  he  or  she  spends  on  each 
task.    This  criterion  normally  eliminates  tabks  that  begin  with  such 
words  as  ^'insure,  have  responsibility  for*^  and  "understrind" ,  which  make 
it  difficult  or  impossible  to  determine  the  reiative  tine-  devoted  to 
this  activity. 

b.  Each  task  communicates  in  the  language  of  che  specialty.  The 
task  statement  must  be  clear  so  tha*.  it  is  easily  understood  by  career 
field  incumbents,  the  people  who  must  answer  the  questionaire .  Termin- 
ology consistent  with  current  usage  in  the  career  field  leaves  less 
chance  for  error  or  d-.fferiug  interpretations  of  task  statements. 

c.  Each  task  ic  mutually  exclusive  of  other  l-jsks  lu  the  inventory; 
that  is,  whether  a  job  incumbent  indicates  the  t  he  or  >.he  perforins  a 

task  must  be  independent  of  hit;  or  her  performance  of  all  other  tasks  in 
the  inventory. 

d.  Each  t^sk  will  differentiate  among  workers  where  actual  task 
performance  differs  because  of  such  factors  as  differences  of  jobs, 
experience  level  (apprentice,  journeyman,  technician',  organizational 
level  (command,  staff,  base,  ilightline,  or  shop),  and  whether  or  not 
the  person  is  a  supervisor. 


DISCUSSION 


In  this  section  we  will  elaborate  the  four  fundamental  criteria  for 
task  development  with  examples,  then  ex.^mine  the  level  of  detail  and  its 
influence  on  the  last  two  criteria.    Finally,  we  will   Look  at  questions 
which  serve  to  guide  the  task  developer  in  setting  ttie  level  of  detail 
in  a  particular  occupational  i:urvey. 
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Time-Ratable 


Tasks  must  be  time-ratable  in  order  to  depict  clearly  the  relative 
time  spent  on  a  particular  task.    Words  or  phrases  whicl-  are  not  time- 
ratable  often  creep  into  inventories  if  this  basic  criterion  is  not 
stressed.    Examples  of  words  and  phrases  which  may  not  be  time-ratable 
include  the  following:     "insure,  assure,  assist,  control,  monitor, 
coordinate,  recommend,  determine,  know  how  to,  understand,  have  knowledge 
of/'  and  "have  responsibility  for."    These  words  and  phrases  are  vauge 
and  can  prevent  the  respondent  and  the  occupational  analyst  from  deter- 
mining when  the  task  started  and  finished.     In  addition,  some  of  these 
examples  are  not  behaviors,  but  rather  knowledges,  like  the  word  "under- 
stand."   Some  of  these  same  words,  however,  can  be  used  as  time-ratable 
tasks  depending  on  the  career  field;  for  example,  while  "to  monitor 
supply  accounts"  may  not  be  a  time-ratable  task  for  a  supply  apprentice, 
"monitoring  the  scope"  may  well  be  a  time-ratable  task  for  a  weapons 
controller  who  actually  does  monitor  a  radar  scope  for  six  hours  of  his 
or  her  eight-hour  day.    Also,  the  word,  "assist",  can  refer  to  s  specific 
task  in  medical  specialties  such  as  "assist  the  surgeon"  in  operating 
room  procedures,  while  in  other  career  fields  the  word  "assist"  is  very 
vague.    The  same  word,  "assist",  in  tfte  machinist  shop  could  mean  stand 
close  by  and  watch,  set  up  the  equipment,  hold  the  equipment  in  place, 
clean  up  afterward,  oi  actually  do  the  task  under  supervision. 

Communicates  in  Language  of  Specialty 

Each  task  must  communicate  in  the  language  of  the  specialty.  To 
reduce  the  possibility  for  error  or  misinterpretation,  USAF  experience 
indicates  that  it  is  clearest  to  construct  inventories  using  the  current 
language  of  the  career  field.    Terms  used  in  the  daily  work  of  the 
career  field  have  a  definite  meaning  to  incumbents.     In  addition,  there 
are  certain  dangers  in  depending  on  an  external  source  of  definition, 
like  a  glossary  of  verbs.    For  example,  if  a  glossary  is  used  that 
depends  on  some  impersonal  source  of  vocabulary  other  than  the  definitions 
in  common  usage  by  career  field  members,  respondents  could  make  the 
following  errors  which  would  lead  to  unreliable  task  ratings:     forget  to 
read  or  simply  skip  reading  the  glossary,  forget  the  glossary  definition 
or  confuse  the  provided  definition  with  the  common  usage  version.  We 
have  found  that  more  solid  and  stable  responses  are  obtained  by  using 
the  language  respondents  use  every  day  on  their  job. 

It  takes  skilled  occupational  analysts  to  differentiate  subtle 
shades  of  meaning  which  exist  in  career  field  vocabulary  and  clarify 
task  statements  in  such  a  way  as  to  prevent  misinterpretation.  For 
example,  in  some  career  fields  the  word  "troubleshoot"  means  to  isolate 
the  problem,  whereas  in  other  career  fields  the  same  word  means  to  both 
find  and  fix  the  problem.     In  this  example,  if  a  standard  glossary 
definition  differed  from  usage,  the  word,  and  consequently  the  task 
would  be  subject  to  much  misinterpretation,  in  unreliability  of  the 
resulting  responses. 
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Mutually  Exclusive  of  Other  Inventory  Tasks 


Each  task  must  also  be  mutually  exclusive  ol'  other  tasks  in  the 
inventory.    If  two  or  more  tasks  are  mutually  dependent  (that  is,  if  one 
task  is  performed,  the  other  task  must  also  be  performed,  or  vice  versa) 
these  tasks  would  be  more  correctly  called  subtasks  or  task  elements  of 
a  parent  task.    For  example,  in  a  weather  observer  inventory  the  tasks, 
"determine  wind  speed''  and  "determine  wind  direction",  are  really  subtasks 
of  the  parent  task,  ''aake  wind  observations",  since  every  time  the 
obsexrver  does  one  task  he  or  she  must  do  the  other  task.    The  parent 
task,  then,  could  more  succinctly  describe  both  activities,  and  therefore 
shorten  the  invencory.    Another  reason  for  dropping  the  subtasks  from 
the  inventory  is  that  mutually  dependent  tasks  may  falsely  inflate  the 
relative  time  spent  in  a  duty  area  by  forcing  the  respondents  to  indicate 
multiple  responses  fox  essentially  the  same  task.    Also,  if  a  parent 
task  is  used,  no  information  regarding  percent  members  performing  would 
be  lost  by  dropping  the  subtasks  in  favor  of  the  parent  task. 

In  CODAP  programs,  each  task  is  valued  equally,  even  if  it  is 
infact  a  subtask  and  not  mutually  exclusive  of  other  tasks  in  the  inven- 
tory.   When  two  subtasks  are  used  instead  of  their  parent  task,  responses 
of  time  spent  and  percent  members  performing  are  recc/ded  on  two  tasks 
instead  of  one.     If  two  subtasks  are  used  instead  of  the  one  parent 
task,  groups  performing  the  parent  task  could  appear  more  similar  in 
the  cluster-merger  diagram.     Individuals  or  groups  responding  that  they 
do  not  perform  the  parent  task  could  appear  more  different  than  they 
would  have  had  they  marked  only  one  task  negatively,  rather  than  the  two 
subtasks.    Thus,  a  more  representative  picture  of  the  career  ladder's 
structure  can  be  obtained  by  using  a  parent  task,  rather  than  its  subtasks. 


Differentiates  Among  Career  Field  Membe r s 

Each  task  must  differentiate  among  career  field  members  where 
actual  task  performance  differs.    For  example,  if  an  apprentice  can  do 
only  parts  of  a  task  under  supervision,  the  journeyman  can  do  the  entire 
task,  and  the  technician  can  supervise  the  task  as  well  as  perform  it, 
the  task  inventory  should  include  items  which  enable  the  occupational 
analyst  examining  the  cluster-merger  diagram  to  make  these  distinctions. 
In  order  to  distinguish  between  groups  which  make  up  an  occupational 
specialty,  the  tasks  must  be  written  at  a  sufficiently  specific  level  oi 
detail.    For  example,  if  an  inventory  only  lists  journeyman  tasks, 
excluding  supervisory  and  apprentice  tasks,  most  members  of  the  career 
field  will  not  be  separated  by  their  performance  level.     If  the  inventory 
does  not  allow  respondents  to  choose  tasks  which  distinguish  the  levels, 
the  occupational  analyst  scrutinizing  the  cluster-merger  diagram  cannot 
find  these  differences. 
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Level  of  Detail 

In  terms  of  level  of  detail,  the  last  two  criteria  are  the  most 
important.    As  we  have  seen  above,  if  two  (or  more)  tasks  are  mutually 
dependent,  or  if  they  do  not  differentiate  the  level  of  work,  what  will 
be  learned  from  data  collection  on  these  items  is  likely  to  be  spurious. 
Differentiation  is  critical  for  defining  the  various  kiuds  of  jobs  which 
make  up  a  specialty.    The  key  to  differentiation  resides  in  the  level  of 
specificity  of  tasks.    Also,  if  the  tasks  are  mutually  exclusive,  taslc 
differentiation  is  enhanced.     In  describing  tasks,  it  is  essential  to 
first  determine  if  any  activity  consists  of  concomitant  elements  in 
which  the  activity  is  a  parent  task.    This  characteristic,  alone,  assures 
differentiation.     But  just  assuring  differentiation  in  today's  environment, 
when  longer  inventories  are  needed,  is  not  sufficient. 

In  many  cases  it  may  be  necessary  to  f i  id  some  way  to  combine 
parent  tasks,  which  would  normally  stand  aloie  in  an  inventory.  For 
/.vamnlP.  th*?  cnmhi nation  of  parent  tasks  may  be  necessary  because  an 
inventory  is  too  long.     In  this  case,  an  additional  criterion  for  task 
writing  is  essential:    the  occupational  analyst  should  determine  if  any 
tasks,  which  are  hein^  considered  for  combination  are  performed  differ- 
ently by  job  incumbents.    That  is,  whether  they  always  exist  concomitantly 
at  any  given  job  location,  job  experience  level,  etc.     Jf  parent  tasks 
do  exist  concomitantly  at  all  levels  and  locations,  then  the  level  of 
detail  may  be  set  at  a  more  general  level  and  the  parent  tasks  may  then 
be  combined.     In  these  longer  inventories  where  space  is  at  a  premium, 
tasks  which  have  high  similarity,  or  tasks  which  could  be  accomplished 
w^^hout  additional  training,  can  be  combined  into  a  more  general  and 
inclusive  task.    For  example,  instead  of  listing  all  150  preflight 
inspection  checklist  items  separately,  the  task,  "conduct  preflight 
inspection  on  (  -yp--  of  Aircraft)"  could  be  used  if  the  preflight  inspec- 
tion is  conducted  the  same  way  at  all  locations  and  experience  levels 
but  differs  by  aircraft.     In  this  case,  differentiation  between  those 
parent  tasks  is  not  necessary,  even  though  the  tasks  may  not  be  mutually 
dependent . 

This  new  criterion,  or  exception  to  the  rules  of  mutual  exclusivity 
and  differentiation,  then  may  help  shorten  today's  longer  multiladder 
inventories.    Thr  J.evel  of  detail  can  be  adjusted  according  to  this 
criterion  without  causing  spurious  data  collection  which  results  from 
failing  to  foUov/  the  four  fundamental  criteria. 
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SUMMARY 


As  occupational  analysis  becomes  more  sophisticated,  the  length  of 
occupational  survey  task  inventories  have  become  longer     The  added 
length  results  from  impetus  to  meet  the  following  objectives:     to  describe 
tasks  at  the  lowest  level  of  work  activities  which  describes  a  complete 
and  inseparable  operation,  to  provide  technical  traininj<  schools  with 
the  most  useful  oata  to  structure  their  courses,  and  to  best  describe 
career  field  structure  to  classification  interests  by  multi-  or  cross- 
ladder  surveys.    Longer  surveys  make  critical  four  fundamental  criteria 
for  describing  occupational  survey  tasks.    These  criteria  are  (1)  each 
task  must  be  time-ratable,  (2)  each  task  must  communicate  in  the  language 
of  the  specialty,  (3)  each  task  must  be  mutually  exclusive  of  other 
tasks  in  the  inventory,  and  (4)  each  task  must  differentiate  among 
workers  where  actual  task  perfonsv^nce  differs.    Compromise  between  these 
criteria  is  often  necessary  in  the  practical  world.    The  appropriate 

•LcVci.   Cf   detail    15   de tcrujlilcd  by   C.'arcfully  ballanClug   Criteria    three  aQu 

four.    Setting  the  level  of  detail  at  the  appropriate  point  maximizes 
the  information  to  be  gained  from  task  inventories  and  minimizes  the 
length  to  provide  accurate  data  to  users  of  the  occupational  survey 
program. 
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TWO  APPLICATIONS  OF  OCCUPATIONAL 
SURVEY  DATA  IN  MAKING  TRAINING  DECISIONS 

Capt.  David  S.  Vaughan 
ATC  Technology  Applications  Center 


In  this  era  of  ever-tightening  budgets,  it  has  become  ex- 
tremely important  that  formal  training  content  match,  as  closely 
as  possible,  actual  job  requirements.    We  can  afford  neither 
overtraining,  which  wastes  training  resources,  nor  undertraining, 
which  increases  the  on-the-job  training  load  and  detracts  from 
primary  mission  accomplishment.    One  very  useful  source  of 
information  for  constructing  job-relevant  training  programs  in 
the  Air  Force  is  the  occupational  survey.     Occupational  surveys 
are  accomplished  on  a  routine  basis  for  most  Air  Force  enlisted 
job  specialties  by  the  USAF  Occupational  Measurement  Center. 
Procedures  used  in  these  occupational  surveys  are  described  in 
Morsh  and  Archer  (196?). 

Data  available  to  trainers  from  occupational  surveys  include 
the  percent  of  airmen  in  a  specialty  (or  in  an  identifiable  sub- 
group of  the  specialty)  who  perform  any  given  task,  the  relative 
time  spent  on  each  task,  and  task  learning  difficulty.    As  may 
be  seen,  this  sort  of  information  can  be  very  useful  for  making 
training  decisions.     Hov/ever,   several  important  questions  are 
not  answered.     In  some  job  specialties,  the  criticality  of  a 
task  plays  an^ important  role  in  determining  training  requirements. 
Task  criticality  is  not  directly  assessed  in  conventional  occupa- 
tional surveys  and  may  not  have  a  close  relationship  with  percent 
members  performing  or  the  other  normal  occupational  survey  vari- 
ables.^ Consider,  for  example,  the  571X0,  Fire  Protection,  Air 
Force  job  specialty.     In  this  specialty  the  most  critical  tasks 
and  those  for  which  training  is  most  needed,  such  ^s  putting  out 
fires,  are  tasks  that,  hopefully,  are  seldom  actually  performed  on 
the  job.     A  second  major  question  concerns  the  procedures  which 
should  be  used  to  combine  data  on  the  several  conventional  occu- 
pational survey  variables  into  one  index  for  ranking  tasks  for 
training.     For  example,  if  task  A  has  high  percent  members  performing 
and  moderate  difficulty,  while  task  B  has  moderate  percent  members 
performing  and  high  difficulty,  which  should  receive  more  emphasis 
in  training?    Without  guidance  concerning  how  to  combine  the  occu- 
pational survey  variables  in  making  training  decisions,  attention 
may  be  focused  on  one  of  the  variables  to  the  exclusion  of  the 
others,  or  an  arbitrary  combination  rule  may  be  used  which  is  less 
than  optimal. 

Recently,  the  Air  Force  Human  Resource  Laboratory  (AFHRL) 
completed  a  research  project  which  was  designed  to  provide 
solutions  to  the  two  problems  outlined  above.     Progress  reports 
concerning  this  research  have  been  presented  at  several  recent 
conferences  (Mial  &  Christal,  197^;  Mead,  1975;  Stacy,  Thompson, 
&  Thomson,  1977)-    Mr.  Hendrick  Ruck  (Ruck,  Thompson,  &  Thomson, 
1978)  will  present  a  detailed  description  of  this  research  and 
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Ifce^lltToullllll  trainir^  research  along  wx  h  o--nt.°-i,„„. 
Ifrli'^'ili^eflrTl  Srail^d  Ssc?S?Ln^o/?hrse '?J^s2ar!h  results 

"^^^  MTorgari:a;io::%he  Air  Training  C-a2^,(Af,Lrfrre°sfarch 
Applications  Center,  ^=  ^^^h  on  Hes?  basis,  evaluate 

•  T.n  +vio  -Fi  r-c^t  Tiroiect.   occupational  survey  uctoa  © 

projects.     In  the  iirsx  i^^^J^^''' .  ^. existing  apprentice- 
Ssed  for  the  narrow  purpose  of  revising  an  existing    pp  project 

-s-Lfe-r^iS^^^^^^^^^^^ 

resident  and  on-tne-joD  Training  kjolj     ^  ^  ^  +oa+prl  in 

jofspecialty.     The  specific  P^^^^^ures  being  field-tested  in 
^oth  If  these  projects  will  ^e  discussed  below.  First^^^^^^ 
^^^^^5he  ^aK'p?oduc?orthrA?iRrfesea?S.1eing  discussed. here 
is  a  neS  ^rcup^a^ional  survey  scale--field  recommended  training 

emphasis.     This  scale  is  iH^^t^^^^^^i^^^^^^^^ioAed  of f icers  in 
scale  are  gathered  by  having  senior  noncommissioned  ofli^ 

the  job  specialty  under  consideration  ^ate  each  tasK 
indicate  the  extent  to  which  J-Pj^^-,^ °^J^,'enticr Lvel-airmen. 
in  formal  training  for  first  5?signmenb  „arious  forms  of 

The  ratings  do  not.  h^^^^^^i' ,  ^^^^^^^g^^S  draining  de^Ichment.  or 
formal  training,   such  as  fsfj^^^lj^^i^cj'^of  distinction .  First. 
OJT.    Two  main  --f^^.f .fe^^'ifi^  do  no?  have  certain  types  of 
on  a  logical  basis.  NCOs  ^i^^J^^f .  ^^^^^^^  technical  training  centers, 
information-resource  availability  at  technical  g^Joridly.  Mead 

for  example--which  ^J^.  ^^Ss'hLe  loTagJeem^nrconcerning  resident 
(1975)  showed  that  field  NCOs  have  i°!!„T^      g^rch  showed  that,  in 
training  vs  OJT.   .In  ^^^^i^^J^'^^^/^^hS  nSd  recommended  train- 
most  job  specialties,  data  f ^^ered  on  the  iiei         p^^^hermore.  the 
ing  emphasis  scale  have  high  ^^^^rrater  agreemenx  gdictable 
AFHRL  Research  showed  that  ratings  on  J^^^J^^t^rs  which  the 

:j^uLi-fai-irtrraL?ne  r^r^^ 

^splcri^?i:?^thif  nirneid  ^ScomSlndef  tfaining  emphasis  scale  is 
both  reliable  and  valid. 

Course  Revision  Project  nr-oiects  which  we  are  con- 
 The  first  of  the  ^wo  applications  projects  whi^^^ 

ducting  with  this  new  JJli^^J^J^^jr^^^f  (Jf^Snslruc^ing  an  appren- 
the  relatively  narrow  goal  of  re^^^^^^S  v^^^^^o^edures  being  tested 
tice-level  resident  training  "  •  ^^J^P^rdecisions  hive  already 

in  this  project  !;|r,^a?Ive!f  She Ti^srLsump^ion  is  that.an^ 
^a;p?en??^;.feverfesrdfnf  ^raKing  course  of  some  sort  will  exist  ^ 
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in  the  particular  job  specialty  being  examined.    Secondly,  it 
is  assumed  that  the  "audience"  for  this  course  has  already  been 
defined.  ^  The  •.■)reviously-def ined  "audience"  might,  for  example, 
be  all  airmen  entering  the  job  specialty  or  airmen  entering  the 
specialty ^ who  will  be  assigned  to  the  Strategic  Air  Command. 
Finally,  it  is  assumed  that  some  general  decisions  have  been  made 
concerning  jobs  to  be  performed  by  airmen  at  various  skill  levels 
in  the  specialty  (in  particular,  that  an  acceptable  Specialty 
Training  Standard  STS  exists).     Of  course,  it  is  recognized  that 
the  outcome  of  the  procedures  being  tested  in  this  project  may 
result  in  modification  of  these  decisions. 

The  goal  of  these  procedures  is  to  translate,  in  a  simple 
and  direct  manner,  occupational  survey  data  into  course  content. 
The  procedures  should  allow  each  topic  covered  in  the  course  to 
be  traced  back  to  one  or  more  tasks  which  were  identified  for 
inclusion. 

The  first  step  of  the  procedures  is  to  select  tasks  for  in- 
clusion in  the  course.     For  this  purpose,  a  special  occupational 
survey  printout  is  provided.    This  printout  is  illustrated  in 
figure  2,    On  the  printout,  all  tasks  are  listed  in  order  of  their 
field^ recommended  training  emphasis.     Beside  each  task  is  printed 
the  field  recommended  training  emphasis,  the  percent  of  first-assign- 
ment airmen  performing  the  task,  and  the  task  difficulty.  Using 
this  printout,  trainix^g  personnel  consider  each  task  for  inclusion 
in  the  course.     In  general,  tasks  with  high  field  recommended  train- 
ing emphasis  are  the  ones  which  should  be  included  in  the  course. 
However,  it  is  recognized  that  many  other  considerations  may  also 
be  important  in  determining  course  inclusion  for  any  given  task. 
For  example,  a  task  with  high  field  recommended  training  emphasis 
might  be  excluded  from  resident  training  if  that  task  has  low  dif- 
ficulty and/or  percent  members  performing,  or  if  equipment  necessary 
to  train  that  task  cannot  be  made  available  at  the  training  center. 
Consideration  might  be  given  to  including  a  task  with  low  field 
recommended  field  training  emphasis  if  that  task  has  high  difficulty 
and  percent  members  performing  or  if  that  task  has  a  great  deal  of 
content  overlap  with  other  tasks  already  included  in  the  course. 
When  available,  information  from  sources  other  than  occupational 
surveys  should  also  be  used  in  making  task  decisions.     For  each  task, 
the  reasons  for  the  decision  to  include  or  exclude  are  documented 
in  a  brief  note  beside  the  task  on  the  printout.    Certain  rules  of 
thumb  are  provided  to  simplify  this  documentation  requirement. 
First,  any  task  whose  field  recommended  training  emphasis  is  at 
least  one  standard  deviation  above  the  mean  requires  written  docu- 
mentation only  if  that  task  is  to  be  excluded  from  the  course  (i.e., 
high  field  recommended  training  emphasis  is  sufficient  to  include 
a  task  unless  other  considerations  dictate  that  the  task  be  ex- 
cluded) .    Any  task  whose  field  recommended  training  emphasis  is 
at  least  one  standard  deviation  below  the  mean  requires  written 
documentation  only  if  it  is  to  be  included  in  the  course  (i.e., 
low  training  emphasis  is  sufficient  for  exclusion,  unless  other 
considerations  are  important).     Only  tasks  whose  field  recommended 
training  emphasis  is  within  one  standard  deviation  of  the  mean 
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require  documentation  for  either  inclusion  or  exclusion.  Train- 
ing emphasis  data  does  not  provide  an  unambiguous  answer  for  these 
"middle  range"  tasks,  and  other  considerations  will  always  be 
important . 

The  next  step  involves  task  analysis--determination  of  the 
skills  and  knowledges  required  to  perform  the  task  and,  thus, 
behavioral  objectives  for  the  course.    Student  evaluation  in- 
struments— course  examinations — are  also  constructed  in  this  step. 
The  original  intent  was  for  the  task  analysis  to  be  accomplished 
through  conventional  Air  Training  Command  procedures.  However, 
at  the  same  time  that  we  were  field-testing  the  procedures  des- 
cribed here,  AFHRL  personnel  were  field-testing  a  new  handbook  for 
task  analysis.     Details  of  the  AFHRL  task  analysis  proredure  are 
described  in  a  paper  to  be  presented  at  this  conference  (Eschen- 
brenner,  De  Vries,  and  Ruck,  1978).    The  AFHRL  experimental  hand- 
book was  made  available  to  course  revision  personnel  to  use  as  a 
part  of  our  course  revision  project.    The  result  of  the  task  analy- 
sis will  be  a  complete  list  of  skills  and  knowledges  required  to 
perform  each  of  the  tasks  identified  for  inclusion  in  the  course. 
Behavioral  objectives  are  also  constructed  as  part  of  the  task 
analysis.    Although  not  part  of  the  AFHRL  task  analysis  procedure, 
course  examinations  will  also  be  developed  during  this  step.  The 
result  of  this  step  will  allow  each  skill  and  knowledge,  behavioral 
objective,  and  test  item  or  segment  to  be  traceable  back  to  one  or 
more  of  the  occupational  survey  tasks  which  were  identified  for 
inclusion  in  the  course. 

In  the  third  step,  course  personnel  identify  groups  of  tasks 
with  common  subject  matter — common  knowledges  and  skills--in  order 
to  provide  organization  for  the  course.    Then  we  will  sum  the 
difficulties  for  all  tasks  in  each  G^oup  and  divide  these  sums  by 
the  sum  of  difficulties  for  all  tasks  being  included  in  the  course. 
The  result  of  this  procedure  is  an  approximate  measure  of  the  rela- 
tive amount  of  training  time  to  be  devoted  to  each  group  of  tasks. 
The  relative  training  time  measure  is  a  guide  for  course  planning 
and  may  be  overriden  when  appropriate.    This  procedure  can,  if 
desired,  be  carried  out  for  eac.'^   "ndividual  task.    Finally,  actual 
course  matter  will  be  developed  using  normal,  currently  followed 
procedures. 

It  is  difficult  to  predict  the  effects  of  this  procedure  on 
curriculum  content.    The  course  may  be  lengthened,  shortened, 
changed  in  other  ways,  or  remain  the  same.     In  any  case,  the 
reason  for  using  occupational  survey  data  in  curriculum  develop- 
ment is  to  insure  that  courses  are  closely  aligned  with  the  survey 
results  and  with  actual  job  requirements.    Even  if  a  course  is 
not  changed  appreciably,  indicating  that  it  was  previously  aligned 
with  survey  results,  systematic  use  of  survey  data  in  revision  of 
the  course  is  desirable  because  it  will  provide  evidence  of  this 
alignment.    Therefore,  the  first  criterion  for  success  of  the  pro- 
posed procedure  is  that  its  application  should  result  in  courses 
that  are  no  less  closely  aligned  with  occupational  survey  data 
than  conventionally  developed  courses.    This  criterion  will  be 
evaluated  by  examining  the  relationship  of  occupational  survey 
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data  to  course  content  before  and  after  revision.     In  addition, 
test  items  developed  under  the  proposed  procedure  will  be  adininiF?- 
tered  to  graduates  of  the  old  and  revised  courses  for  follow-on 
comparisons •     Significant  revisions  in  course  content  should  be 
reflected  in  the  knowledge  levels  of  course  graduates.     If  gradu- 
ates of  the  old  course  perform  as  well  as  graduates  of  the  revised 
course  on  these  new  items,   the  occupational  survey  data  probably 
had  no  major  impact  on  the  course  content.     Conversely,  large 
test  score  difference  could  Iridlcalie  significant  revisions  in 
course  content.     It  is  recognized  tliat  factors  other  than  use  of 
occupational  survey  data  can  result:  in  scor'e  differeiices  between 
graduates  of  old  and  revised  courses.     However,  the  test  score 
data  will  be  interpreted  in  light  of  other  data  to  be  gathered. 
For  example,  if  large  test  score  differences  are  found,  but  the 
POIs  for  the  old  and  revised  courses  are  identical,  the  test  score 
differences  are  probably  not  due  to  use  of  occupational  survey 
data  in  course  development. 

Two  other  criteria  which  should  be  met  for  the  proposed  Pfo- 
cedure  to  be  considered  successful  are  efficiency  and  acceptability 
to  users.     To  be  considered  efficient,  any  additional  resources 
required  by  the  proposed  procedure  should,  in  the  judgment  of  users, 
be  counterbalanced  by  benefits  over  and  above  those  obtained  by 
using  conventional  procedures.     To  measure  possible  additional  re- 
sources required  by  the  new  procedure  or  resource  savings  obtained 
under  the  new  procedure,  users  v/ill  rate  the  time  and  other  re- 
sources required  under  the  nev;  procedure  relative  to  that  required 
under  conventional  procedures.     Users  will  also  rate  the  benefits  * 
of  the  new  procedure  relative  to  those  conventional  procedures,  as 
well  as  the  relative  acceptability  of  the  new  procedure.  This 
resource  and  benefit  data,  as  well  as  data  gathered  to  meet  the 
first  criterion,  will  be  made  avaiJab.le  to  the  user's,  who  will  rate 
whether  any  additional  benefi  l:s  obtained  un{Ter  t'.he  new  procedure 
are  worth  possible  additional  resource  requirements.     In  order  for 
the  application  to  be  considered  successful,  the  new  procedure, 
must  be  at  least  as  erficient  and  as  acceptabif-  as  currently-used 
procedures. 

We  are  currently  testing  tliese  pr'ocedures  In  two  job  special- 
ties— 19333,  Apprentice  Radio  Opera  toi-;  anr^  91130,  Apprentice  Aero- 
space Physiology  Specialist.     In  both  of  these  courses,    the  task 
selection  process  was  completed  in  less  than  one  day.     Course  per- 
sonnel are  currently  working  oti  the  second  step--the  task  analysis. 
We  had  hoped  to  be  able  to  report  here  some  results  from  our  formal 
evaluation  of  these  procedures.     However,  the  revision  efforts  are 
not  far  enough  along  in  either  of  the  two  courses  for  much  data  to 
be  available.     We  can  say  that  course  personnel   found  the  task 
selection  process  to  be  reasonably  efficient  and  are  finding  the 
results  to  be  useful.     The  only  r^eal  problem  encountered  so  far 
concerns  the  training  emphasis  cutoff  value  below  which  written 
documentation  is  required  only  for  a  task  to  be  excluded  from  the 
course.     Field  recommended  training  en.phasis  values  have  an  extreme 
positive  skew.     Out  of  an  lnvenl:ory  with  over  ^00  tasks,  only  .'^O 
or  100  may  have  very  liigii  rec<.>iinneMded  lir.j^lnlng  emphasis  values. 


Therefore,  the  mean  training  emphasis  is  low,  and  one  standard 
deviation  below  the  mean  is  extremely  low.    Based  on  our  experience 
to  date.,  it  appears  that  this  cutoff  should  be  set  higher  than  one 
standard  deviation  below  the  mean,  perhaps  at  an  absolute  value  of 
2.0. 

Construction  of  Specialty  Training  Standards 

An  Air  Force  Specialty  Training  Standard  (STS)  defines  train- 
ing requirements  for  an  entire  career  ladder.     First,  an  STS  serves 
as  a  specification  document  for  formal  training.    Second,  it  is  the 
basis  for  preparation  of  Career  Development  Courses  (CDCs).  Third, 
an  STS  is  a  guide  for  local  OJT  programs  and  for  preparation  of 
Job  Proficiency  Guides  used  in  OJT.     An  STS  contains  information 
concerning  the  topics  for  which  training  is  to  be  provided  at  each 
skill  level  (apprentice,  specialist,  and  technician)  in  an  Air 
Force  Job  Specialty  (AFS).     In  addition,  information  is  provided 
concerning  the  degree  of  training  to  be  provided  in  OJT  and  in 
formal  training  courses  and  concerning  reference  material  which  may 
be  used  in  training.    A  well-constructed  STS,  therefore,  provides  a 
comprehensive  description  of  training  requirements  in  an  entire  AFS: 
including  data  concerning  what  tasks  are  to  be  trained  at  each  skill 
level  and  the  extent  of  the  training  to  be  provided.    A  sample  page 
from  an  STS  is  illustrated  in  figure  3.    Each  subject  area  on  an 
STS  has  skill-knowledge  codes  which  indicate  the  amount  of  training 
to  be  provided  at  each  skill  level.     Figure  4  contains  a  key  for 
these  skill-knowledge  codes. 

The  purpose  of  this  application  project  is  to  develop  and  fiel 
test  a  systematic  procedure  for  applying  occupational  survey  data  i 
constructing  STSs.    Algorithms  will  be  tested  for  selection  of  tasks 
to  appear  on  an  STS,  for  identification  of  tasks  for  resident  train- 
ing and  for  assignment  of  skill-knowledge  codes.    The  new  field 
recommended  training  emphasis  factor  will  be  used  in  addition  to 
normal  occupational  survey  data.    This  project  is  a  direct  follow- 
on  to  that  of  Ruck,  Dineen,  and  Cunningham  (1977) • 

Under  this  STS  construction  procedure,  a  number  of  decisions 
are  made  using  arbitrary,  although  reasonable,  cutoff  values.  It 
is  recognized  that  these  cutoffs  may  not  be  appropriate  in  certain 
circumstances,  and  that  information  not  contained  in  the  occupational 
survey  data  may  also  be  relevant  in  making  STS  decisions.  There- 
fore, a  manual  override  option  is  allowed  at  each  decision  point 
concerning  task  selection  and  skill-knowledge  coding.  However, 
reasons  for  manual  override  will  be  documented,  and  approval  will 
be  obtained  from  appropriate  authorities . 

The  first  step  involves  selection  of  the  tasks  to  appear  on  the 
STS.     Also,  in  this  step,  tasks  v/ill  be  matched  with  particular  skill 
levels  in  the  AFS.    STS  task  statements  will  be  taken  directly  from 
the  occupational  survey  task  list.    The  use  of  occupational  survey 
task  statements  has  several  advantage's.    First,  a  great  deal  of 
labor  can  be  saved,  since  a  well  written  task  inventory  provides, 
ready-made,  a  detailed,  behaviorally-oriented  breakdown  of  all  job 
activities  in  a  particular  career  ladder.    Secondly,  use  of  occupa- 
tional survey  tasks  on  an  STS  eliminates  the  problem  of  relating 
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occupational  survey  data,  based  on  one  job  breakdown,  to  an  STS, 
which  is  usually  based  on  a  different  breakdown.     Finally,  use  of 
occupational  survey  task  statements  on  an  STS  eases  the  establish- 
ment of  the  relationship  between  occupational  survey  data  and 
other  training  documents,   such  as  course  Plans  of  Instruction, 
which  are  based  on  STSs.     The  rule  followed  for  selection  of  tasks 
is  that  any  task  with  at  J  east  10  percent  members  performing  at 
any  skill  level  will  appear  on  the  STS  and  will  be  matched  with 
.that  skill  level.     In  addition,  tasks  will  be  matched  with  all 
levels  above  such  a  skill  level  (for  example,  the  five  skill  level 
will  include  all  tasks  matched  with  the  three  skill  level,  as  well 
as  tasks  not  performed  at  the  three  skill  level). 

The  second  step  involves  selection  of  tasks  for  inclusion  in 
an  ABR  course.     The  field  recommended  training  emphasis  task  factor 
will  be  used  to  make  these  decisions.    Any  tasks  whose  field  recom- 
mended training  emphasis  exceeds  a  minimum  cutoff  value  on  the 
training  emphasis  task  factor  will  be  that  which  is  equivalent,  in 
the  regression  sense,  to  30  percent  members  performing  among  first 
assignment  airmen. 

The  last  step  involves  assignment  of  skill-knowledge  codes  to 
the  STS.     Two  different  procedures  will  be  tested  for  skill-know- 
ledge code  assignment.     A  separate  STS  will  be  constructed  using 
each  of  the  two  procedures,  allov/ing  a  direct  comparison  of  the 
procedures.     Examples  of  STSs  constructed  under  these  two  proce- 
dures are  contained  in  figures  5-6. 

(1)  Under  the  first  procedure,  a  skill-knov/ledge  code  will  be 
assigned  to  each  task  for  each  skill  level  with  which  the  task  is 
associated.     These  skill-knov/ledge  codes  will  be  assigned  through 
currently  followed  procedures. 

(2)  The  second  skill-knowledge  code  assignment  procedure  is 
based  on  the  "go-no  go"  philosophy.     Under  this  philosophy,  an  OJT 
trainer  signs  off  an  STS  area  when  an  airman  reaches  the  "go"  level 
of  performance  in  that  area.     No  gradations  of  performance  or  know- 
ledge are  recognized  in  OJT  beyond  the  "go"  and  "no  go"  levels. 
Herein,  assignment  of  skill-knowledge  codes  to  STS  areas  is  assumed 
unnecessary  for  OJT  purposes.     However,  skill-knowledge  codes  will 
still  be  needed  to  i-eflect  partial  training  on  particular  tasks 
which  may  be  given  in  an  ABR  course.    Tlierefore,  the  foil:  'dng 
skill-knowledge  ,,code  assignment  procedure  will  be  used  only  to 
assign  codes  for  iasks  included  in  an  ARR  course:     Any  task  whose 
field  recommended 'training  emphasis  exceeds  a  value  which  is  equi- 
valent, in  the  regression  sense,  to  50  percent  members  performing 
will  be  assigned  the  2b  code.     All  other  tasks  included  in  the 

ABR  course  will  be  assigned  the  la  code.    As  with  all  other  auto- 
mated decisions,  those  made  in  this  step  may  be  modified  through 
manual  override . 

Under  the  conventional,  skill-knowledge  approach,  the  differ- 
ence among  the  various  skill  levels  are  mainly  a  matter  of  degree-- 
of  how  well  airmen  can  perform  various  tasks.     Under  the  "go-no  go" 
approach  to  be  tested,  differences  ?=^mong  skill  levels  are  not  a 
matter  of  "how  well",  but  a  matter  of  what  tasks  are  performed. 
For  example,  a  specialist-level  airman  can  perform  all  tasks  thai; 
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an  apprentice-level  airman  can  perform,  as  well  as  some  additional 

An  evaluation  will  be  conducted  of  these  experimental  STS 
construction  procedures.    The  evaluation  will  involve  comparisons 
among  three  STSs:    a  "go-no  go"  experimental  STS,  an  experimental 
STS  with  oonventional  skill-knowledge  coding,  and  the  conventional 
S*§i    fh§  m@§=lj  Important  criterion  in  the  evaluation  will  be  accep- 
tability to  users.    This  criterion  will  be  measured  through  surveys 
of  all  classes  of  users— formal  trainers,  OJT  trainers  and _ super- 
visors, career  field  functional  managers,  etc.    Another  criterion 
will  be  the  ease  with  which  the  proposed  procedures  can  be  followed. 
This  will  be  evaluated  through  surveys  of  personnel  who  are  directly 
involved  with  the  construction  of  the  experimental  STSs. 

These  STS  construction  procedures  are  currently  being  applied 
in  three  job  specialties.    We  had  hoped  to  be  able  to  present  some 
results  from  the  formal  evaluations,  but  STS  construction  is  not 
yet  complete  in  any  of  the  specialties.    However,  in  one  of  the 
specialties,  9IIXO,  Aerospace  Physiology,  one  of  the  experimental 
STf^s  is  complete  (with  conventional  skill-knowledge  coding),  and 
the  other  i's  nearing  completion.    We  are  able  to  offer  some  obser- 
vations from  that  experience.    The  actual  selection  of  tasks  was 
completed  in  less  than  a  day  by  training  personnel.    The  remainder 
of  the  procedure  (assignment  of  skill-knowledge  codes,  grouping 
tasks  into  subject  areas,  revising  some  tasks,  etc)  was  completed 
with  about  15  hours  of  labor.    Course  personnel  are  very  satisfied 
with  the-  results.    Also,  an  Instructional  Systems  Design  specialist^ 
who  serves  as  a  consultant  to  the  entire  school  in  whichthe  Aero-  ^ 
space  Physiology  courses  are  located,  reviewed  the  experimental  STS 
and  indicated  that  he  liked  the  STS.    It  seems  fair  to  say,  based  on 
this  preliminary  information,  that  formal  training  personnel  like 
the  experimental  ST3s,  at  least  the  conventionally  skill-knowledge 
coded  STS.     However,  no  information  is  yet  available  concerning  how 
other  users,  such  as  OJT  trainers  will  like  the  experimental  STS. 

Summary  ,.      ,  j„+„  ^„ 

Two  sets  of  procedures  for  using  occupational  survey  data,  in- 
cluding data  on  the  new  field  recommended  training  emphasis  scale, 
in  training  decision-making  are  currently  being  field-tested.  Both 
sets  are  designed  to  provide  simple  and  efficient  methods  for  trans- 
lating occupational  survey  data  into  training  content.    The  first 
set  of  procedures  have  the  limited  goal  of  revising  (or  constructing) 
an  apprentice-level  resident  training  course.    The  second  set  has 
the  more  ambitious  goal  of  determining  all  training  requirements, 
both  resident  and  OJT,  in  an  entire  career  ladder.    Results  J*^e 
planned  formal  evaluations  are  not  yet  available.     However,^  both 
sets  of  procedures  are  currently  being  applied  in  several  job 
specialties;  the  experience  obtained  to  date  suggests  that  Doth 
sets  of  procedures  will  be  useful. 
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FIELD     RECOMMENDED    TRAINING    EMPHASIS  SCALE 

Check  each  task  for  which  you  recommend  formal  training  for  first- 
term  AIRMEN. 

Rate  only  the  tasks  y.ou  checked  to  indicate  how  much  formal  training 
emphasis  you  recommend  for  first-term  airmen. 

1    extremely  little 
2 

5  AVERAGE 

-  /I,  •  i  - 

8 

9      EXTREMELY  HEAVY 
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COI'RSE  REVISION  PROJECT 
COflPllTER  PRINTOHT 

SEQUENCE 
NUMBER 

EMPHASIS 

DIFFICULT 

7o  MEMBERS 
PERFORMING 

F  195 

Serve  as  inside  observer  on  training  chamber  flights 

1 

7.99 

3.89 

96.9 

F  10^1 

Serve  as  lock  operator  on  training  chamber  flights 

2 

7.79 

'1.5^1 

93.8 

F  101 

Serve  as  chamber  operator  on  training  chamber  flights 

1 

3 

7.7^! 

1 

3.97 

93.8 

H  193 

1 

Perform  structure  tests  of  pressure  suit  gloves 

m 

1 

2.62  ' 

i|.77 

3.1 

H  236 

Suit  up  crew  members  with  pressure  suits 

2.59 

^1.39 

E  87 

Proofread  correspondence,  reports  or  forms 

1 

m 

2.51 

• 

ij.Ol 

23.1 

A  2 

t 

Act  as  training  program  advisor  at  staff  level 

3^12 

.10  ' 

5.89 

10.8 

D  66 

Develop  resident  course  curricula  materials 

3^13 

.10 

5.59 

3.1 

B  23 

Conduct  staff  meetings 

3f.i} 

.06 

i|.91 

3.1 
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TASKS,  KNOWLCOCe 
AHD  STUDY  REFCRtNCtS 


J.  Ahlmal 
nd  M. 


A 
A 

la/a 
la/a 


la 
la 
2b 


20.  ANIMAL  SERVICE  AND  ZOONOTIC  DISEASE  CONTROL 

SR:    AFMs  125-5  (chap  4,  vol  1),  160-12,  160-37, 
AFP  163-1-3,  163-10  (sec  A  6  B);  Caccoct,  E 
Publicaciona  (AVP),  1971;  Benbrook,  E.  A., 
Iowa  State  University  Press,  3rd  ed,  1961 

a.  Principles  of  animal  care,  management, 
raedicine  and  surgery 

b.  Principles  of  identification  and  control  of 
zoonotic  and  other  diseases  of  animals  (in- 
cluding controlling  entry  of  foreign  animal 
diseases  Into  the  US) 

c.  Assist  in  the  zoonoses  control  program 

d.  Assist  in  the  management,  veterinary  care, 
treatment  and  necropsy  of  government  owned 
animals 

e.  Prepare  reports  and  maintain  records  pertain* 
Ing  tc  veterinary  care  cf : 

(1)  Privately  owned  animals 

(2)  Government  owned  animals 

f.  Perform  laboratory  and/or  clinical  proce- 
dures related  to  control  of  animal  and 
zoonotic  diseases 

g.  Procedures  for  evaluation  and  deconcamlnatioi 
of  military  working  dogs  exposed  to  nuclear, 
biological  or  chemical  agents 

21.  ANIMAL  TECHNICIAN  SPECIALTY  (For  personnel 
assigned  duties  as  an  Animal  Technician,  SEI  491 
exclude  from  consideration  in  development  of 
SKT  and  CDC) 

a.    Occupational  health  and  safety 

SR:    AFP  161-25;  AFRs  92-1,  127-101,  127-4, 
161-6,  161-8,  161-18,  161-24 

(1)  Injury  and  zoonotic  disease  hazards  in 
the  research  animal  colony 

(2)  Apply  appropriate  occupational  safety 
practices 


b.    Medical  terminology 

SR:    AFH  160-56  (chap  2);  American  Associatidn  for 
Manual  for  Laboratory  Animal  Techniciani  ,  196 

tter , 
Univei 


Purina  Manual;  Worden,  A.  N.  and  Lane  P< 
Management  of  Lab  Animals ,  4th  ed, ,  The 
(hercifter  listed  as  the  UFAW  Handbook) 

(1)  Medical  terminology  relating  to  anatomy 
and  physiology 

(2)  Disease 

(3)  Surgery 

(4)  Axenlc  animals 


rROFICIEHCY  LEVEL.  PROGRESS  RgCORO  AWD  CERTIFICATION 


J  SKILL  LgytL 


AFSC 


160-4  3;  AFis  125-9,  l61- 


27-6, 


B 

OJtt 

OJT 
Started 


W.  SJoss,  Vete 


0«tt  Compld 
t  Tdintt't 

Suptrvisor't 


Hospital  'rechHolog> 


127-12,  160-56 


Laboratory  Anii  al  £ 
(hereafter  lit  ted 
W.  eds:     The  UI  AW  ] 


sities  Federat 


3. 


5  SKILL  LgygL 


A 

AFSC 


inary  Clinical  Parasitology. 


B 

0«lt 

OJT 
Siamd 


,  162-4,  163-1 


(cljap  7) 

C 
3b 


0t\9  CcmpM 
&  TrainN'i 
Suptrvitor't 
Initiilt 


,  American  Vete  rinar) 


160-57,  ^60-13 

D 
4c 


cienc  ;  Pub  67-: 
as  AA  JIS  Pub  6) 
andbo  )k  on  the 


on  lor  An  Lmal  Welfire,  15  72 


7  SKILL  LgygL 


AFSC 

/CfU 


,  I6a-10; 


4c 


4c 
4c 
4c 


C  ec 

o 
u 


-3): 

Care_4nd 


B 

Oat* 

OJT 
Startttf 


Oitc  Coi 
ft  Tiainre' 
Supervisor's 
initials 
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CUALITATIVE  REOUIRE//.ENTS 


SIS  30oX0 


PROFICIENCY  CODE  KEY 


SCALE 
VALUE 


uo' 


Con  do  ..,.p|,  po„.  of  .h.  .o.k.    N..d.  fo  b.  .old  o.  .hown  (,o.  .o  d.  «o..  o( 
(EXTREMELY  LIMITED)  o  .h. 


^  5  -< 

^  2  '-' 
^  S> 


Con  bo  n,o..  po...  of  .h.  ,o.t.    N..d.  h.lp  onl,  on  ho.C.u,  po,..     Mo,  no.  iocol  d,„,o,.d. 

»pe>d  Of  occu.ocy.   (PARTIALLY  PROFICIENT) 


Con  do  oil  po,..  of  .h.  ,o.k.  N..d.  onl^  o  .p„.  ch.ck  of  compl...d  -o-i.  M..r.  Ucol 
o^mondi  lor  sp««d  ood  occu'ocy.  (COMPETENT) 


Con  do  th«  eompl.l.  to.b  qu.ckly  ond  ocCMfoi.ly.  Con  fil  or  .how  oth.r.  how  ,o  do  tK.  ,o»li 
(HIGHLY  PROFICIENT) 


Con  nom*  porii.  tooli.  ond  simpl*  focts  obout  th«  tosk.  (NOMENCLATURE) 


r  o  "J 


Con  d*i*rmin*  iivp  by  st«p  proc«dur««  (or  doing  fK«  tosV.  (PROCEDURES) 


Con  ..ploin  ond  wh.n  ih«  loili  muii  b«  doo«  ond  why  .och  it.p  »««d«d 

(OPERATING  PRINCIPLES) 


Con  pr.d.C.  .d.n..fy.  ond  r..oly.  probl.m.  obou.  th.  ,o.k.    (COMPLETE  THEORY) 


U  O  -J 


5  >  UJ 


Con  idvniily  bone  f 


one  loeii  land  i«rmi  oboui  ih«  tub|OCf.  (FACTS) 


Con  ..plo.n  r.lot.on.hrp  o(  bo.,c  (oc.  ond  .,o..  ^onorol  principl..  obou*  .h.  .ub,.c..  (PRINCIPLES) 


C  Con  onolyx.  (oc.  end  pr.neipU.  ond  drow  conclu»ioo.  obou.  .h*  .ub,.c..  (ANALYSIS) 


Con  .voluoi.  condil.on.  ond  mok^  propor  d.ei.ioni  obou*  »h«  tubj.cl.  (EVALUATION) 


-  EXPLANATIONS  - 

•    A  .o.k  Uowl.dp.  .col.  volu.  .oy  b.  u..d  olon.  or  wi.h  o  ,o.i,  p.r(ocn,onc.  .col.  volu.  .o  d.f-n.  o  l.v.l  of  Uowl.d,. 
lo>  o  ipoolie  .oik.    (EiompUi'    b  ond  IbJ  .now. .09. 

'•".""''t^'  ">  -  of  knowl.d,.  to,  o  .ubj.c.  no.  d.,.e.l,  ,.lo..d  ,0  onv 

Sp.CillC  lO.k,  or  foi   O  Subl.Cl  common  to  l.v-.rol  totks.  ^  '•■oi.o  10  Ooy 

-     Tf,..  i.  «..d  olon.  ,n...,d  of  o  .col.  volu.  .0  .how  ,ho.  no  prof.el.ncy  .r'oininfl  ..  prov.d.d  .n  ff,.  eou,..    o,  .f,o. 

no  prol>Ci.ncy  ift  r.quir.d  oi  ihii  skill  I. v. I, 

X     Th..  mork  .»  ut.d  olon.  m  court,  colum.  .  lo  aK-*-.  thot  .rc.n.T,e  ia  not  givon  du.  «o  limrf„t.cn.  .n  r..a.„c.t. 
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CTVTEHTATIVE  STS  PROFICIENCY  CODE  KEY 
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h.  Teach  posb-f  light  chamber  flight  procedures 

i.  Teach  procedures  during  chamber  flights 
11.     HYPOBARIC  CHAMBER  MAINTENANCE  AND  INSPECTION 

SR:  T.Os.  A3D8-3-1-101 ,  A3D8-3-2-6 

a.    Perform  daily  inspections  of  low  pressure  chambers 

c.  Perform  spsicial  inspections  of  low  pressure  chambers 

d.  Recharge  batteries  for  emergency  intercom  systems 

e.  Remove  or  replace  flourescenC  tubes  Inside  low 
pressure  chambers 

f .  Remove  or  replace  operator  panel  instruments 
g 


12. 


Remove  or  replace  oxygen  equipment  items  on  low 
pressure  chambers 


h.    Add  oil  to  vacuum  pumps 

SR:  T.Os.  34Y5-3-29-4',  34Y5-3-35-1 
Soldfer  breaks  in  intercom  wiring 


Prepare  or  maintain  records  on  status  or  inspections 
of  equipment 


SR:    T.Os.  00-20-5,  00-20-7 
LIFE  SUPPORT  EQUIPMENT  FUNCTIONS 
SR:    AFP  160-5  (chap  13  and  14) 

a.  Fit  oxygen  masks 
SR:    T.O.  15X-4-4-12 

b.  Fit  parachutes 

SR:-    T.Os.  14D1-1-1,  14D1-2-1 


2b/la 
2b/la 


2b 

2b 
2b 

2b/- 

2b/- 
2b/- 

2b/- 
2b/- 

2b/- 
2b 


3c/2b 


2b/la 
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TASKS.  KNOWLEDGES  AND  STUDY  REFERENCES 

f.   


h.  Teach  post-flight  chamber  flight  procedures 

i.  Teach  procedures  during  chamber  flights 
HYPOBARIC  CHAMBER  MAINTENANCE  AND  INSPECTION 
SR:  T.Os.  A3D8-3-1-101.  A3D8~3-2~6 

a.  Perform  daily  inspections  of  low  pressure  chambers 

b.  Perform  periodic  inspections  of  low  pressure  chambers 

c.  Perform  special  inspections  of  low  pressure  chambers 

d.  Recharge  batteries  for  emergency  intercom  systems 


e« 


Remove  or  replace  flourescent  tubes  inside  low 
pressure  chambers 


f «     Remove  or  replace  operator  panel  instruments 

g.     Remove  or  replace  oxygen  equipment  items  on  low 
pressure  chambers 


Add  oil  to  vacuum  pumps 

SR:    T.Os.  3AY5-3-29-4;  3AY5-3-35-1 

Solder  breaks  in  intercom  wiring 


i. 
j- 


Prepare  or  maintain  records  on  status  or  inspections 
of  equipment 


SR:     T.Os.  00-20-5,  00-20-7 
LIFE  SUPPORT  EQUIPMENT  FUNCTIONS 
SR:    AFP  160-5  (chap  13  and  lA) 

a.  Fit  oxygen  masks 

SR:    T.O.  15X-A-/i-12 

b.  Fit  parachutes 

SR:-    T.Os.  14D1-1--1,  14D1-2-1 


Af  sc/o. 

7. 
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X 

X 

X/la 
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V 
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THE  STABILITY  OVER  TIME  OF  AIR  FORCE  ENLISTED  CAREER 
LADDERS  AS  OBSERVED  IN  OCCUPATIONAL  SURVEY  REPORTS 


Walter  E.  Driskill,  Ph.D. 
and 

Frederick  B.  Bower,  Jr.,  Capt,  USAF 


USAF  Occupational  Measurement  Center 
Occupational  Survey  Branch 
Lackland  AFB  TX,  78236 

A  basic  assumption  behind  the  Air  Force  occupational  survey  has 
been    that    advances    in    technology   and    improvement    in  management 
procedures  and  techniques  create  over  time,  changes  in  the  type  of  job 
performed    within    a    given    occupational    specialty.      Through  the 
occupational    survey,    these    changes    could   be    identified   and  the 
appropriate  updating  of  classification  documents  and  training  programs 
would  then  be  made  so  that  individuals  in  that  occupation  are  trained 
and  utilized  in  the  most  efficient  manner.     Research  seems  to  indicate 
that  the  program  has  been  pointed  toward  the  identification  of  change  in 
Air  Force  jobs  since  its  early  development  days. 

One  objective  of  the  Air  Force  program  as  described  by  Morsh  (1964) 
is  the  identification  of  job  changes  and  the  determination  of  training 
needs.     He   determined   this   during   reliability   studies  of   the  job 
inventory  methods  of  occupational  survey,   although  as  Prien  and  Ronan 
(1971)   point  out,    the   logical   research  exter-ions  are  not  reported. 
This  was  also  the  premise  of  Christal  (1969)  in  his  reliability  studies 
of   the  job   inventory.     Both   assumed  that   since   reliability  varies 
depending  on  the  time   interval   between  ratings,   changes  in  the  job 
survey  would  be  noted  over   time.     However,    this  early  emphasis  on 
identifying  change   in   order   to   show  the   reliability  of  the  survey 
instrument  may  have  led  those  within  the  program  away  from  identifying 
job  stability.     As  pointed  out  by  Driskill,  Keeth,  and  Mitchell  (1978), 
The  USAF  Occupational  Measurement  Center  has  now  been  in  existence  long 
enough  to  have  resurveyed  many  enlisted  career  ladders  for  the  second 
and  sometimes  third  time.     As  such,  our  perceptions  of  how  to  approach 
the  analysis  of  occupational  survey  data  is  changing.     Like  Morsh  and 
Christal,  we  see  the  change  in  the  areas  of  time  requirements  and  task 
occurance,  but  we  are  also  seeing  stability  in  the  job  structure  of  many 
career    ladders    as    evidenced    in   the    recent    surveys.     Of   the  76 
occupational    survey  of  enlisted  career   ladders   surveyed  between  1 
January  1977  and  30  June  1978,  71  of  the  ladders  were  being  resurveyed 
and  59  of  these  were  found  to  have  remained  essentailly  stable  over  the 
time  since  the  previous  survey.     Seven  career  ladders  were  identified  as 
having  changed  to  some  degree  but  none  had  changed  to  any  great  extent. 
No  determination  of  stability  or  change  could  be  made  for  the  remaining 
five  because  either  radical  differences  between  formats  of  the  survey 
instruments  or  different  approaches  to  the  job  analysis  by  the  survey 
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analysts  made  comparisons  too  difficult.  It  should  be  pointed  out  that 
this  comparison  between  surveys  is  now  made  as  a  routine  part  of  every 
survey  analysis.  The  determination  of  career  ladder  stability  is  made 
by  the  survey  analyst  based  on  the  data  collected.  Nineteen  analysts 
working  independently  of  one  another  determined  the  stability  of  these 
59  career  ladders  as  a  part  of  their  normal  job  and  not  as  any  sort  of 
special  project  or  study. 

To  illustrate  just  how  stable  career  ladders  can  appear,  two  such 
specialties  will  be  used  to  display  the  various  comparisons  that  can  be 
made  to  determine  stability  over  time  between  surveys.    These  career 
ladders  were  chosen  for  ease  of  data  display  and  because  the  jobs  per- 
formed in  the  specialties  are  readily  understood  both  inside  and  outside 
the  military  community.     The  two  career  specialties  chosen  as  examples 
are  Dental  Laboratory  Personnel  and  Air  Force  Recruiters.  Dental 
Laboratory  Personnel  are  responsible  for  the  fabrication  and  repair  of 
dental  prostheses  such  as  complete  dentures,  partial  dentures,  bridges, 
and   crowns .     Air  Force  Recruiters  are  responsible   for  contacting, 
interviewing,  and  smoothly  processing  prospective  applicants  for  active 
duty  Air  Force  service. 

The  firsL  comparison  that  can  be  made  between  surveys  is  that  of 
career  ladder  structure.     This   is   the  job  structure  of  the  career 
specialty  determined  on  the  basis  of  what  people  are  actually  doing  in 
the  field.    The  job  groups  are  determined  throught  computer  analysis 
using  the  Comprehensive  Occupational  Data  Analysis  Programs  (CODAP) . 
The  CODAP  groups  jobs  according  to  similarity  of  respondents*  responses 
to  the  job  tasks  performed  and  the  amount  of  time  spent  performing  those 
tasks.     Table  1  depicts  the  comparison  of  the  Dental  Laboratory  career 
ladder  structure  between  the  April  1974  survey  and  the  Jun  1978  survey. 
Every  job  identified  in  the  first  survey  can  also  be  found  in  the  career 
ladder  structure  in  the  current  survey.     The  differences  in  groupings 
are  merely  a  function  of  each  survey  analyst's  preference  in  choice  of 
reporting  points.     Some  analysts  prefer  to  report  small  individual  job 
groups  while  others  prefer  to  report  larger  job  clusters. 

Another  point  to  be  brought  out  on  this  slide  is  the  decrease  from 
the  previous  study  in  the  number  of  personnel  fabricating  removable 
partial  dentures  and  the  increase  in  the  current  study  of  personnel 
fabricating    crowns,    bridges    and   porcelain  products.     As  dental 
technology   has    improved   the   quality  and  appearance  of  prosthetic 
implants,   demand  for  these  products  has  increased  while  the  use  of 
removable  partial  dentures  has  decreased.     However,  some  patients  will 
always  require  removable  partial  dentures  for  one  reason  or  another,  so 
the  job  of  their  fabrication  will  not  go  away.    Therefore,  the  job 
structure  within  the  Dental  Laboratory  career  ladder  remains  stable  even 
though  the  number  of  personnel  working  in  particular  jobs  has  changed. 

Table  2  shows  the  comparison  of  the  career  ladder  structure  between 
surveys  of  recruiter  personnel.     The  job  ladder  here  is  remarkably 
similar  considering  the  extensive  revision  and  reorganization  of  the 
survey  instrument  used  to  collect  the  data  for  the  current  study.  The 
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improvements  in  the  job  inventory  resulted  in  the  identification  of  the 
Production  Management   and  Classification   Interviewer  jobs.  However, 
recruiter  personnel   revealed  these  jobs  had  existed  at  the  time  of  the 
first  survey,  but  tasks  had  not  been  included  in  that  job      ventory  to 
capture  them.     This   further  tended   to  verify  the  stability   of  this 
specialty. 

Another  comparison  made  to  determine  career  ladder  stability  is 
that  of  the  percent  time  spent  performing  various  duties  of  the  job. 
Since  none  of  the  duty  titles  changed  between  development  of  the  job 
inventories    used    to    survey   Dental    Laboratory   personnel.    Table  3 
provides  a  good  example  of  this  comparison.     The  differences  in  time 
spent    fabricating    and    repairing    removable   partial    dentures  and 
fabricating  procelain  products  was  explained  previously.    The  1-24  month 
active  federal  military  service  group  was  chosen  to  further  illustrate 
stability  of  the  initial  job  assignment  in  this  career  ladder  in  that 
there   is  no  carry-over  of  personnel   from  the  previous  survey  to  the 
current  survey. 

A  comparison  of  the  percent  of  members  performing  tasks  between 
surveys  is  also  used  to  determine  stability  of  jobs  over  time.    Table  4 
shows  this  comparison  of  tasks  for  Dental  Laboratory  personnel  with  1-24 
months  active  federal  military  service.     Again,  despite  the  completely 
different  makeup  of  each  sample  group,  the  percent  of  members  performing 
each  task  is  comparable. 

Also  shown  on  Table  4  is  a  comparison  of  the  difficulty  of  each 
task    between    surveys.      Task   difficulty    is    determined   by  asking 
experienced  personnel  in  the  job  specialty  to  rate  each  task  in  the 
survey  instrument  on  the  basis  of  how  long  it  takes  to  learn  to  do  the 
task.     A  nine  point  scale  is  used  with  "one"  being  a  very  small  amount 
of  time  needed  to  learn  the  task  to  "nine"  being  a  very  large  amount  of 
time  to  learn  the  task.     The  ratings  are  then  computer  adjusted  so  that 
tasks   of  average  difficulty  have   ratings   of  5.00.     Task  difficulty 
ratings   are   accomplished   for  each  survey  and   the  sample  chosen  to 
perform  the  ratings  is  selected  at  random.    Therefore,  the  high  degree 
of    similarity    in    task   difficulty    ratings    is    evidence   that  the 
perceptions   of   the   difficulty  of  jobs  within  this  particular  career 
ladder  have  not  altered  over  time. 

Prior  to   1974,   task  difficulty  ratings  were  not  adjusted.  Rather 
the  raw  average  scores  were  utilized.     As  Table  5  illustrates,  even  when 
comparing  raw  scores  to  adjusted  scores,   the  order  of  task  difficulty 
remains  relatively  the  same. 

The  final   comparison  made  for  career  ladder  stability  is  that  of 
job   skill   level .     Table  6  depicts   a   different   specialty  than  the 
previous   examples   but  one   chosen  because   it  spans  nearly  10  years 
between  the   first   and  the  current  surveys.     As  illustrated,  5-skill 
level  Inventory  Management  Specialists  have  remained  relatively  constant 
in  the  percent  of  members  performing  the  various  tasks  relative  to  their 
jobs.     Only   in  the  areas  of  operating  data  processing  equipment  has 
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there  been  a  steady  rise  in  the  niomber  of  personnel  performing  those 
tasks-    As  the  Air  Force  supply  function  became  more  automated  such  an 
occurance  was  naturally  expected. 

As  shown,  the  determination  that  a  career  ladder  is  stable  is  more 
than  just  identifying  like  job  groups.     It  is  an  in-depth  comparison 
between  surveys  of  not  only  the  career  ladder  structure  but  a  comparison 
by  skill  level  and  time  in  service  groups  plus  task  difficulty  as  well. 

The  implications  of  identifying  so  many  stable  career  ladders  are 
varied  and  complicated.    Certainly  classification  and  training  personnel 
will  be  better  able  to ^manage  their  resources  and  training  programs  with 
this  knowledge.    However,  these  managers  must  not  let  themselves  neglect 
stable  career  ladders.     Even  in  the  most  stable  of  career  areas,  as 
technology    improves    and    the    Air   Force  acquires   new  and  more 
sophisticated  weapon  systems  and  equipment,  utilization  patterns  and 
training  needs  will  change.    Certainly  stable  career  ladders  need  not  be 
surveyed  as  frequently  as  they  may  have  been  in  the  past.    However,  we 
must  remain  responsive  to  changes  in  the  field  and  always  be  prepared  to 
provide  timely  data  on  any  career  ladder  if  the  requirement  arises. 
Certainly  the  verification  of  career  ladder  stability  will  allow  survey 
analysts    the    time    to    broaden    their  horizons   and   explore  the 
possibilities   of   other  uses  and  applications  of  the  survey  data. 
However,  analysts  must  never  lose  sight  of  the  fact  that  the  foundation 
of  an  occupation  is  the  job  structure,  and  that  job  structure  has  to  be 
identified  in  order  to  properly  interpret  any  of  the  other  factors 
relating  to  the  personnel  performing  in  that  career  specialty.  While 
the  concept  of  career  fields  is  utilized  primarily  by  the  military,  as 
McCormick  (1976)  points  out  there  are  many  civilian  areas  that  could 
also  be  viewed  as  career  fields.    As  such,  job  stability  is  very  likely 
within  the  civilian  community  as  well.    Like  military  managers,  civilian 
personnel    utilizing   occupational   survey   data   must   guard  against 
identifying  a  stable  job  area  and  then  failing  to  continue  to  monitor  it 
for  change  in  the  future. 

The  apparent  stability  of  the  majority  of  jobs  in  the  Air  Force 
enlisted  career  structure  has  only  recently  been  identified.    There  is 
much  to  do  in  this  area  before  such  data  can  be  fully  emploited.  For 
example,    job    stability    must   be   defined   and   objective  criteria 
established  so  that  stability  may  be  determined.    Even  now  though,  the 
concept  of  stability  within  Air  Force  career  ladders  is  impacting  on  the 
Occupational  Survey  Program  and  on  the  use  of  occupational  data  in 
classification  training  construction  of  career  development  courses , 
testing,  and  other  USAF  personnel  programs. 
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AIR  FORCE  SPECIALTIES  IDENTIFIED  AS  STABLE  THROUGH  OCCUPATIONAL  SURVEYS 
CONDUCTED  JANUARY  1977  THROUGH  JUNE  1978 
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DIET  THERAPY 

MAR 

78 

OCT  73 

701X0 

CHAPEL  MANAGEMENT 

MAY 

78 

DEC  73 

901X0 

AEROMEDICAL 

MAR 

77 

NOV  71 

982X0 

DENTAL  LABORATORY 

JUN 

78 

APR  74^^ 

99500 

RECRUITER 

MAY 

78 

MAR  73SP 

t 


t 
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TABLE  1 

MPARISON  OF  CAREER  LADDER  STRUCTURE  BETWEEN  SURVEYS  OF 
AFS  962X0  DENTAL  LABORATORY  PERSONNEL 


APR  7a  SURVEY  (H=SO.) 
COilPLETE  DENTURE  CLUSTER  (i^=i91) 
nORKIi^G  SUPERVISION  CLUSTER  (N=103) 
.RTHUDOi^TIC  JOB  TYPE  (N=i5)  


:R0WN  AND  BRIDGE  CLUSTER  (N=56)- 
'.lTAL  FINISHING  CLUSTER  (N=51)- 

^AX  P  CLUSTER  (N=22)- 
SiREA  LAB  SUPERVISION  CLUSTER  (N=i3) 


JUN  78  SURVEY 

bASE  DENTAL  LAB  PERSOiiNEL  (N=307) 

-ORTHODONTIC  APPLIANCE  SPECIALISTS 
(N=9) 

-CROWN  AND  BRIDGE  FABRICATORS  (N=97) 
•PORCEUIN  FABRICATORS  (N=I5) 

REMOVABLE  PARTIAL  DENTURES  FABRICATORS 
■DE.NTAL  LAB  MANAGERS  (N=29: 
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TABLE  2 


-RISON  OF  CAREER  LADM-  STRUCTURE  BETWEEIi  SURVEYS  OF  RECRUITER  PEPS 


.  SURVEY  (N=1665) 


AY  78  SURVEY  (N=16.1S) 


RECRUITER  CLUSTER  (\=i:92) 
PUITER  CLUSTER 
"UITER  (N=7)- 


:SGRY  CLUSTER  (N=19C)- 
ISCN  NCO  (N=61)  


:SING  AND  PUBLICITY 
R  (N=i}3)  


,G  CLUSTER  (N=23) 
J.STER  (N=86) 


RECPUITER  SALESr.EN  (N=1127) 

-RECRUITER  f'.ANAGEf'ENT  PERSON:, 

■AFEES  LIAISON  fXOs  (N=166) 

ADVERTISiriG  AND  PUBLICITY 
-  NCCs  (K=29) 

TECHNICAL  SCHOOL  INSTRUCTORS 


PROELCTIOf.  rANAGEKENT  PERSON 
CLASSIFICATION  INTERVIEV/ERS 


TABLE  3 

.  CO.SPARISO.J  OF  P£RCbiT  'f  TIi^E  SPENT  PERFORi'^ILiG  DUTIES  BETWEE.^  SURVEYS  .F 

AFS  9bzX0  DENTAL  LABORATORY  PERSONNEL 
(1-24  I'lGMnS  ACTIVE  FEDERAL  MILITARY  SERVICE) 


0RGA,uZIN6  Ai^  PLANNING 
DIRECTING  m  lilPLEMENTING 
I. ASPECT  I NG  AND  EVALUATING 
TrvAI.w.tG 

PERFuil'-'ilNG  ADMINISTRATIVE  ANu  SUPPLY  TASKS 
P£RF0R:1ING  GENERAL  LAiiORATORY  TASKS 
FABRICATING  AND  REPAIRING  COMPLETE  DENTURES 
FABRICATING  AND  REPAIRING  REi'-'.CVABLE  PARTIAL  DENTURES 
F.-:cRI:ATING  CROWNS  INLAYS  A,iD  FIXED  PARTIAL  DENTURES 
iF.AiRICATING  PORCELAIN  PRODUCTS 
FABRICATING  AND  REPAIRING  ORTnOJONTIC  APPLIANCES 
FABRICATING  SPECIAL  PROSTHESES 

: GATES  LESS  THAN  1  PERCl.J 


APR  74 

Ji  78 

S-RVEY 

1 

1 

2 

2 

• 

1 

X 

1 

1 

n 

X 

48 

53 

13 

10 

16 

10 

9 

1 

7 

r 

1 

• 
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ilL 

mm  flCIIVE  FEDEBAL  HILIIMy  SERVICE) 


It' 


itlELOP  lESIS 

k7M  yiRES  10  illELS  FOR  ORIIiiillC 


=LAI.  Lil!Oyi  OF  FACILITIES 
■A:.1AIE  PECTRIS-EXCMIRlfi  IHPLANIS 


APR /I  lASK  Mil  TASK 
S 


5 

12 
1/ 
9 

32 
13 
5 
33 


6,50 


12 


9 


2i| 


2  /,33 


5,80 

ye 

6,85  w 


38  5,39 


y5 

6,M 


nr.: 

fit 
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''ABLE  6 


CUMPARISOi^  OF  PERCtr^T  OF  i>]EMBERS  PERFORMING  TASKS  BETWEEN  SURVEYS  OF 
5-SKILL  LEVEL  INVENTORY  MANAGEMENT  PERSONNEL 


TASKS 

DEC  68 
SURVEY 

OCT  72 
SURVEY 

JUL  78 
SURVEY 

^5 

'/Amm  SUSPENSE  FILES 

18 

COUNT  PROPERTY 

26 

20 

18 

PREPARE  ISSUE  DOCUMENTS 

26 

19 

19 

COi'.PARE  PHYSICAL  COUNTS  OF  PROPERTY  WITH  STOCK 
RECORD  BALANCES 

18 

16 

16 

PLACE  LOCATION  SYriBOLS  ON  STORAGE  FACILITIES 

11 

5 

PREPARE  TURN- IN  DOCUMENTS 

10 

11 

18 

ESTABLISH  BENCH  STOCKS 

9 

6 

11 

3?tRATE  REIiOTE  KEYBOARD  UNITS 

15 

25 

39 

.1 
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The  Collection  and  Prediction  of 
Training  Emphasis  Ratings  for  Curriculum  Development 

by 

Hendrick  W.  Ruck 
Nancy  A.  Thompson 
David  C.  Thomson 

Air  Force  Human  Resources  Laboratory 
Brooks  AFB,  Texas 

The  opinions  and  conclusions  expressed  in  this  paper 
are  those  of  the  authors  and  are  not  necessarily 
those  of  the  United  States  Air  Force. 

One  of  the  most  difficult  questions  that  arises  in  occupational 
curriculum  design  is,  "What  should  the  training  content  be?"  This 
question,  in  the  business  of  Air  Force  vocational  training,  could  be 
further  reduced  to  the  fundamental  questions  of  "Which  occupational 
tasks  should  be  included  in  the  curriculum?"  and  "How  do  those  tasks 
translate  into  specific  skills  and  knowledge?"   The  purpose  of  this 
paper  is  to  address  the  first  of  the  fundamental  questions;  i.e.,  the 
selection  of  tasks  for  training. 

The  Air  Force  Human  Resources  Laboratory  has  been  conducting 
extensive  research  in  the  training  requirements  area.    The  initial  con- 
cepts and  theory  guiding  the  research  were  first  proposed  by  Christal 
(1970).  who  suggested  that  boards  of  expert  judges  could  be  used  to 
study  information  about  tasks  that  are  hypothesized  to  be  related  to 
the  training  decision.    The  experts  could  then  evaluate  those  tasks  in 
terms  of  the  appropriateness  for  inclusion  in  curricula.    He  further 
suggested  that  the  mathematical  technique  of  policy  capturing  be 
applied  to  the  judges'  decisions  so  that  the  policy  of  the  judges  could 
be  applied  to  additional  tasks.    This  approach  would  reduce  the  necessity 
of  expert  judgment  in  task  selection  for  each  task  and  would  assure 
more  consistent  decisions  since  the  mathematical  model  of  the  experts' 
decisions  could  be  used  instead  of  additional  judgments  by  the  experts. 

These  initial  suggestions  have  been  studied  in  a  stream  of  research 
on  task  training  factors.   Mial  and  Christal  (1974)  developed  a  number 
of  task  training  factors  and  were  able  to  predict  judges*  mean  rank 
ordering  of  tasks  for  priority  in  training  using  a  four-factor  regression 
equation  (R=.88;  P<.001).    Their  research  was  conducted  using  the 
Medical  Service  specialty.   Mead  (1975)  has  presented  additional  evi- 
dence as  to  the  utility  of  the  policy-capturing  approach.    He  performed 
a  similar  study  to  that  of  Mial  and  Christal  using  a  different  specialty 
(Law  Enforcement)  and  met  with  similar  success  in  predicting  training 
priorities.    Mead  successfully  used  both  mean  rankings  of  training 
priority  and  mean  ratings  of  training  priority  in  his  research.  These 
studies  suggested  that  a  promising  link  between  Instructional  System 
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Development  (ISD)  theory,  occupational  survey  data,  and  curriculum 
design  could  be  further  developed. 

Stacy,  Thompson,  and  Thomson  (1977)  presented  a  paper  last  year  at 
the  Military  Testing  Association  Conference  outlining  preliminary 
results  of  task  factor  data  collection,  training  emphasis  prediction, 
and  task-anchored  scaling.    Stacy  found  that  the  task  training  factors 
could  be  collected  reliably  using  standard  occupational  survey  techniques. 
He  also  reported  success  in  using  the  pol icy-capturing  approach  for  a 
number  of  specialties.    The  present  paper  will  discuss  the  results  of 
policy-capturing  research  on  13  Air  Force  specialties,  the  similarities 
and  differences  in  policies  for  different  specialties,  and  the  implica- 
tions of  the  research  for  Instructional  System  Development  (ISD).  A 
separate  paper  is  being  presented  at  this  conference  by  Squadron  Leader 
David  C.  Thomson  (Thomson  and  Goody,  1978)  documenting  the  results  of 
the  task  anchored  scaling  research. 

Method 

Research  conducted  prior  to  Stacy,  et  al  (1977)  focused  on  two 
specialties.    This  study  was  designed  to  test  the  general izability  of 
earlier  findings.    Therefore,  13  of  the  14  specialties  studied  by 
Stacy,  et  al ,  were  selected  for  this  study  (Table  1).    The  specialties 
were  selected  so  that  occupational  survey  data  and  job  inventories  were 
current,  initial  skill  courses  were  mandatory  for  entry  into  the  special- 
ties, and  all  four  aptitude  areas  used  in  Air  Force  job  placement 
(Mechanical,  Electrical,  General,  and  Administrative)  were  represented. 
As  a  result  of  the  operational  occupational  surveys  conducted  by  the  AF 
Occupational  Measurement  Center,  data  were  available  on  percent  members 
perfom'Ing  each  task,  an  index  of  percent  time  spent  on  each  task,  the 
learning  difficulty  of  each  task,  and  the  average  grade  of  members 
performing  each  task.    Additional  data  that  were  collected  for  the 
study  included:    (a)  field  recommended  training  emphasis  for  each  task, 
(b)  present  school  emphasis  for  each  task,  (c)  probable  consequences  of 
inadequate  performance  for  each  l^'ik,  and  (d)  deley  tolerance  for  each 
task.    The  learning  difficulty  task  factor  used  in  this  study  was 
collected  using  a  nine-point  relative  scale.    However,  the  other  two 
factors  (consequences  of  inadequate  performance,  task  delay  tolerance) 
were  collected  using  nine-point  scales  (Stacy,  Thompson  &  Thomson; 
1977)  that  were  verbally  anchored  and  did  not  require  relative  task 
comparisons.    These  factors  have  been  described  previously  (Stacy,  et 
al ,  1977);  however,  the  training  emphasis  scale  will  be  described  again 
in  this  oaper  because  of  its  importance. 

The  field  recommended  training  emphasis  scale  was  developed  as  the 
criterion.    It  was  expected  to  yield  equivalent;  information  to  the  mean 
rank  orderings  of  training  priority  as  used  by  Mial  and  Christal  (1974), 
since  M'^ad  (1975)  demonstrated  the  equivalency  between  rankings  and 
ratings    or  training  priority.    The  field  recommended  training  emphasis 
scale  .    a  nine-point  scale  ranging  from  "Extremely  Little"  to  "Extremely 
Heavy."    Senior  NCOs  serving  in  operational  units  in  each  specialty  are 
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Table  1 

AFSC  Aptitude  Areas  and  Raters 


 Number  of  Respondents/Raters 

Aptitude   Members  Training 


AFSC 

Title 

Area 

Total 

Emphasis 

Consequences 

Delay 

Diffic 

293X3 

Radio  Ooerator 

A 

1468 

224 

45 

50 

78 

3041(0 

Radio  Relay  EauiDment 

E 

1573 

215 

35 

50 

89 

304X4 

Ground  Radio 

E 

2351 

335 

60 

58 

122 

Communication  Equipment 

43 

328X3 

Electronic  Warfare 

E 

1223 

306 

46 

47 

412X2 

General  Puroose  Vehicle 

M 

3338 

www 

291 

33 

34 

127 

Mechanic 

552X5 

Plumbing  Specialist 

M 

964 

143 

82 

62 

116 

651X0 

Procurement  Specialist 

A 

979 

320 

61 

63 

101 

672X1 

General  Accounting 

A 

596 

85 

55 

55 

86 

672X2 

Disbursement  Accounting 

A 

1352 

149 

65 

65 

86 

Specialist 

2198 

93 

95 

58 

902X0 

Medical  Services 

G 

380 

906X0 

Medical  Administration 

G 

2356 

300 

105 

104 

78 

911X0 

Physiological  Training 

G 

408 

79 

30 

30 

86 

981X0 

Dental  Specialist 

G 

1856 

89 

65 

47 

45 

Total 

20662 

4220 

1096 

1098 

1451 

asked  to  (a)  check  each  task  for  which  formal  training  (school  or  on- 
the-job  training  (OJT))  is  recommended  for  first-term  airmen,  and  (b) 
riite  each  of  the  tasks  that  were  checked  using  the  nine-point  scale. 
The  training  emphasis  scale  is  normally  treated  in  data  reduction  and 
analysis  as  a  10-point  scale  since  the  absence  of  a  check  mark  is 
treated  as  zero.    This  differs  from  other  ISO  task  factors  used  in  the 
Air  Force  occupational  survey  program  since  every  task  is  normally 
considered  to  possess  some  amount  of  each  ISD  factor.    That  is,  for 
example,  no  task  would  be  expected  to  have  zero  learning  difficulty. 
Similarly,  no  task  would  have  zero  consequences  of  inadequate  perfonnance 
or  delay  tolerance.    Tasks  could,  however,  have  zero  field  recommended 
training  emphasis. 

The  field  recommended  training  amphasis  scale  has  been  intensively 
reiaarched.    It  has  been  collected  in  the  research  mode  for  19  special- 
ties and  in  the  operational  mode  for  an  additional  21  specialties. 
Table  2  lists  the  AF  specialties  and  associated  interrater  agreement 
data  for  the  field  recommended  training  emphasis  data  collected  to 
date.    The  median  interrater  agreement  coefficient  is  .95.    Analyses  of 
rater  agreement  data  suggest  that  a  minimum  of  40  raters  should  be  used 
to  provide  reliable  results  for  the  recommended  training  emphasis 
scale. 


The  validation  of  field  recommended  training  emphasis  was  performed 
using  policy  capturing  (Christal,  1968).    Policy  capturing  requires 
that  a  multiple  regression  model  be  developed  in  an  attempt  to  "capture" 
the  policy  of  the  judges  in  their  ratings  or  rankings.    Basically,  it 
is  the  development  of  explanatory  and  predictive  regression  models. 
The  policy  model  that  was  developed  to  predict  field  recommended  train- 
ing emphasis  included  three  task  factors  and  three  related  job  factors, 
together  with  squares  of  the  factors.    The  task  factors  in  the  model 
were  learning  difficulty,  probable  consequences  of  inadequate  performance, 
and  task  delay  tolerance.    The  job-related  factors  were  percent  members 
performing  in  the  first  assignment,  an  index  of  percent  time  spent  by 
members  in  their  first  assignment,  and  the  average  weighted  grade  of 
members  performing.    Since  each  factor  was  squared  to  address  expected 
curvilinear  relationships,  a  twelve  variable  repression  equation  was 
generated  for  each  Air  Fo^ce  specialty. 

The  ISD  literature  has  often  been  interpreted  as  suggesting  that 
there  is  one  correct  way  to  combine  task  and  job  factor  information  in 
order  to  derive  training  requirements.    This  hypothesis  was  tested  by 
analyzing  the  regression  equations  (Ward,  1963;  Gott,  1978)  for  each 
specialty  to  determine  whether,  in  fact,  different  policies  as  expressed 
in  the  policy  equation  exist  across  specialties,  or,  as  one  might 
expect,  there  is  one  universal  eouation  (or  combination  rule).  The 
analyses  required  to  test  the  Hypothesis  can  be  conducted  using  a  hier- 
archical grouping  algorithm  which  tests  similar  regression  equations 
for  homogeneity  of  weights.  O^T"^ 


Table  2 

Descriptive  Statistics  of  Training  Emphaiis  Ratings 


Avg 

D 

No.  of 

AFSC 

Title 

Mean 

SD 

'^kk* 

Raters 

mxo 

Defensive  Aerial  Gunner 

3.49 

1.94 

.94 

43 

293X3 

Ground  Radio  Operator 

2.26 

1.44 

.92 

189 

303XX 

Aircraft  Control  and  Warning  Radar 

3.08 

1.76 

.96 

52 

304X0 

Radio  Relay  Equipment 

2.38 

1.50 

.94 

199 

304X4 

Ground  Radio  Communication  Equipment 

1.82 

1.15 

.90 

315 

307X0 

Telecommunication  Systems  Control 

2.87 

2.11 

.96 

75 

321 XX 

-Defensive  Fire  Control  Systems 

2.48 

2.09 

.97 

50 

328X3 

Electronic  Warfare  Systems 

1.02 

1.25 

.94 

248 

341 XX 

Training  '^evices 

1.38 

1.15 

.89 

46 

361X0 

Outside  Wire  and  Antenna  Maintenance 

3.53 

1.67 

.94 

40 

423X1 

Aircraft  Environmental  Systems 

2.63 

1.34 

.91 

137 

423X4 

Aircraft  Pneudraulic  Systems 

2.89 

1.56- 

.93 

282 

427X2 

Nondestruction  Inspection 

3.89 

2.29 

.98 

178 

427X5 

Airframe  Repair 

3.48 

2.16 

.97 

267 

443X0 

Missile  Maintenance  (LGM  25-Titan) 

3.91 

2.30 

.97 

53 

462X0 

Aircraft  Armament  Systems 

2.72 

1.81 

.96 

186 

463X0 

Nuclear  Weapons 

2.05 

2.29 

.90 

40 

472X2 

General  Purpose  Vehicle  Maintenance 

2.39 

2.30 

.98 

243 

472XX 

Vehicle  Maintenance 

3.42 

1.75 

.95 

49 

542X2 

Electronic,  Power  Production 

3.85 

1.54 

.93 

,40 

552X5 

Plumbin'g 

3.32 

1.62 

.95 

125 

555X0 

Programs  and  Work  Control 

2.61 

1.48 

.92 

54 

571X0 

Fire  Protection 

3.55 

1.87 

.95 

51 

601X4 

Packaging 

1.42 

1.72 

.97 

17 

602XX 

Passenger  and  Freight 

2.75 

1.58 

.92 

26 

622X1 

Diet  Therapy                       < i  • -  • 

3.68 

1.82 

.95 

47 

631X0 

Fuel                                  •  • 

3.22 

1.92 

.95 

277 

645X0 

Inventory  Management 

1.77 

1.55 

.92 

12 

645X1 

Materiel 

1.27 

1.37 

.92 

15 
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Table  2  (Continued) 
Descriptive  Statistics  of  Training  Emphasis  Ratings 


Avg 

No.  o1 

AFSC 

Title 

Mean 

SD 

Raters 

645X2 

Supply 

1.31 

2.25 

.99 

5 

651X0 

Procurement 

2.85 

1.63 

.94 

295 

672X1 

General  Accounting 

1.29 

1.41 

.95 

73 

672X2 

Disbursement  Accounting 

1.21 

1.37 

.95 

131 

902X0 

Medical  Service 

3.44 

1.72 

.93 

302 

904X0 

Medical  Laboratory 

2.91 

1.83 

.95 

46 

906X0 

Medical  Administrative 

1.71 

1.19 

.91 

270 

907X0 

Environmental  Health 

3.25 

1.82 

.95 

64 

911X0 

Aerospace  Physiology 

2.59 

2.00 

.96 

68 

981X0 

Dental 

2.33 

2.06 

.97 

85 

982X0 

Dental  Laboratory 

2.84 

2.02 

.97 

23 

*Rater  agreement  indices  for  a  sample  of  40  raters  as  estimated  by  the  Spearman  Brown  formula. 
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Analysis  of  Policy  Equations 


The  results  of  the  grouping  analysis  of  the  13  policy  equations 
are  highlighted  in  Table  3.    Notice  that,  if  the  regression  equation 
derived  for  each  of  the  13  specialties  was  used  to  predict  for  that 
specialty,  the  overall  predictive  efficiency  would  be  quite  high  (.86). 
On  the  other  hand,  predictive  efficiency  using  a  single  averaged  equation 
for  each  of  the  specialties  would  result  in  unacceptably  low  predictive 
efficiency  of  .56.    As  a  result  of  the  analysis,  a  compromise  solution 
appears  to  be  one  which  uses  one  equation  (Policy  A)  for  eight  special- 
ties, and  a  second  equation  (Policy  B)  for  the  remaining  five  specialties 
The  equation  used  in  Policy  A  yields  an  R-squared  of  .72;  however,  the 
Policy  B  equation  has  an  R-squared  of  .64.    This  suggests  that  the 
specialties  in  Policy  B  are  not  as  predictable  (using  the  ISO  factors) 
as  those  in  Policy  A. 


Grouping  of  Training 


Number  of 
Equations 

Maximum  13 
Optimal  2 
Minimum  1 


able  3 

Priority  Policy  Equations 

Overall 
Predicti ve 
Efficiency  (R^) 

.86 
.73 
.56 


Additional  analyses  of  the  differences  between  the  two  policies 
were  performed  in  an  attempt  to  isolate  characteristics  of  specialties 
in  each  policy.    The  specialties  in  Policy  B  differed  from  those  in 
Policy  A  in  that  Policy  B  specialties  were  mea^sured  with_  job  inventories 
that  had  significantly  more  tasks  than  in  A  {X^  =  425,  Xg  =  951; 
t  =  3.81,  df  =  11,  p<.01)  and  IPolicy  B  specialties  included  significantly 
more  job  types  than  Policy  A  (Xa  =  27.1,  Xb  =  47.6,  t  =  -2.69,  df  =  11, 
p<.05).    It  is  important  to  note  here  that  no  significant  relationship 
was  found  between  number  of  tasks  in  a  job  inventory  and  number  of  job 
types  identified  in  an  occupational  analysis  (r  =  .29,  ns). 

Analysis  of  Interrater  Agreement 

Although  the  complex  relationships  among  recommended  training 
emphasis  and  the  ISO  factors  will  not  be  more  fully  developed  in  this 
presentation,  one  may  conclude  that  there  is  no  single  method  of  com- 
bining ISO  factor  dv'a  to  arrive  at  training  emphasis  for  all  specialties 
This  conclusion  applies  if  Air  Force  specialties  are  considered  the 
unit  of  analysis;  however,  the  conclusion  has  not  been  tested,  and  may 
not  hold  up  if  the  unit  of  analysis  "^'s  changed  to  job  groups  within 
specialties  rather  than  specialties.    Nevertheless,  the  finding  is 
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significant,  since  most  technical  training  in  the  Air  Force  is  developed 
for  specialties  and  not  for  job  groups. 

After  determining  that  there  were  at  least  two  different  policies 
that  could  be  used  to  predict  training  emphasis,  and  that  the  policies 
differed  in  predictive  efficiency  and  type  of  specialty,  further  analyses 
of  interrater  agreement  were  conducted.    Although  no  difference  in 
interrater  agreement  was  found  among  the  specialties  in  each  policy 
group,  interrater  agreement 'was  found  to  be  r^oderately  correlated  with 
predictive  efficiency.    That  is,  the  correlation  between  R-squared  for 
each  specialty  and  interrater  agreement  (Rij)  on  training  emphasis  is 
significant  (r  =  .61,  p<.05). 

Table  2  displays  the  interrater  agreement  values  adjusted  for  40 
raters,    "sing  a  conservative  cutoff  of  .91  for  acceptable  interrater 
agreement,  one  can  see  that  five  (or  12.5  percent)  of  the  specialties 
do  not  meet  the  cutoff.    This  analysis  leads  to  the  conclusion.  ?hat  the 
recommended  training  emphasis  data  can  be  reliably  collected  in  at 
least  80  percent  of  the  Air  Force  specialties. 

Complex  Specialties 

The  training  emphasis  research  has  resulted  in  a  criterion  that 
may  be  collected  and  used  in  decision  making  in  a  large  number  of 
specialties.    Two  types  of  problem  specialties  have  been  identified  in 
the  research.    First,  there  are  specialties  with  lov;  interrater  agreement. 
Second,  there  are  specialties  for  which  predictability  of  recommended 
training  emphasis  is  not  as  high  as  is  necessary  for  practical  prediction. 
These  complex  specialties  are  being  investigated  further.    The  new 
research  will  attempt  to  (a)  determine  whether  additional  factors  may 
be  useful  for  predicting  recommended  training  emphasis  in  complex 
specialties  through  the  development  of  additional  task  factor  scales, 
(b)  examine  the  complex  specialties  for  common  characteristics,  (c) 
determine  optimal  data  displays  Vor  complex  specialties  for  training 
decision  makers,  (d)  determine  which  specialty  characteristics  are 
associated  in'th  low  interrater  agreement  and  poor  predictability. 

Discuss  ion 

The  task  training  factor  research  stream  has  produced  significant 
results.    First,  ISD  task  and  job  factors  have  been  identified  and 
scales  developed  to  measure  them.    Second,  the  field  reconrnended 
training  emphasis  scale  has  been  developed  as  a  criterion.    The  training 
emphasis  scale  has  been  shown  to  be  reliable,  through  interrater  agree- 
ment analyses  in  40  specialties,  and  valid,  through  policy  capturing 
and  policy  grouping  in  13  specialties.    Third,  no  single  way  of  combining 
ISD  factors  for  training  decisions  was  found  to  be  appropriate  for  all 
specialties. 

The  initial  objective  of  the  research  was  to  discover  combination 
rules  that  may  be  applied  to  ISD  factor  data  for  selecting  tasks  for 
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training.    This  has  been  done.    However,  the  rules  differ  for  different 
groups  of  specialties.    The  criterion  used  in  the  research  has  been 
found  to  be  both  reliable  and  valid.    Furthermore,  only  a  moderate 
number  of  raters  Cabout  40)  is  required  to  provide  stable  data.  These 
findings  have  led  to  the  unexpected  conclusion  that  the  criterion 
should  be  collected  and  not  predicted.    It  is  important  to  note  that 
recommended  training  emphasis  ratings,  although  useful  for  most  Air 
Force  specialties,  will  not  always  be  immediately  usable.    In  particular, 
complex  specialties  appear  to  require  additional  study  to  enhance 
understanding  of  low  predictive  efficiency  and  poor  interrater  agreement. 

As  a  result  of  this  research,  it  has  been  recommended  that  super- 
visory ratings  of  formal  training  emphasis  be  collected  routinely  in 
the  Air  Force  Occupational  Survey  Program.    Further,  it  has  been 
recommended  that  routine  collection  of  the  task  factor?  Consequences  of 
Inadequate  Performance  and  Task  Delay  Tolerance  be  discontinued  since 
recommended  training  emphasis  ratings  include  consideration  of  those 
factors.    In  cases  where  more  than  one  Air  Force  specialty  would  be 
included  in  a  single  job  inventory,  it  is  recommended  that  separate 
ratings  be  collected  for  each  specialty  and  that  those  ratings  be 
analyzed  and  presented  for  each  of  the  specialties.    Finally  it  has 
been  recommended  that  the  training  emphasis  data  be  presented  using  n??w 
modularized  CODAP  (Thew  &  Weissmueller,  1978)  programs  that  allow  a 
merging  of  training  documentation,  such  as  Specialty  Training  Standards 
(STS)  or  Plans  of  Instruction  (POI)  with  job  inventory  tasks.  This 
merging  of  job  inventory  tasks  and  training  documents  provides  a  simple 
and  reliable  method  of  displaying  occupational  survey  data  within  the 
context  that  training  personnel  are  most  familiar.    The  Appendix  displays 
an  example  of  the  output. 

The  research  leading  to  the  conclusions  and  recommendations  has 
been  difficult  and  complex.    However,  the  available  technology  for 
using  occupational  survey  data  for  training  decisions  has  been  consid- 
erably expanded  and  implementation  of  the  results  would  provide  a  much 
stronger  basis  for  making  training  decisions  than  is  currently  avaiiable. 
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APPENDIX 


1.  This  appendix  contains  sample  computer  output  that  is  being  recom- 
mended for  use  by  training  managers  and  curriculum  developers.  The 
printout  is  in  two  p^:-^is;  the  first  part  (pp  12-14)  is  an  executive 
summary,  and  the  second  part  (pp  15-16)  provides  detailed  Occupational 
Survey  (OS)  data  that  have  been  matched  with  Specialty  Training  Standard 
(STS)  items. 

2.  Four  columns  are  on  each  of  the  subsequent  printouts.    Column  1 
displays  the  number  of  OS  tasks  that  have  been  matched  to  each  of  the 
STS  items.    The  data  displayed  in  column  2  are  the  recommended  training 
emphasis  ratings  collected  from  AFS  293X3  (Ground  Radio  Operator)  field 
supervisors.    Column  3  includes  the  percent  of  job  incumbents  with  2-24 
months  military  service  in  the  293X3  career  ladder  who  perform  the 
tasks.    Finally,  Column  4  shows  the  learning  difficulty  for  each  task. 

3.  The  executive  summary  has  been  designed  to  aggregate  OS  task  data  to 
STS  item  level.    Pages  12-14  display  STS  Items  in  original  STS  sequence. 
Mean  values  for  each  of  the  three  OS  task  factors  for  the  tasks  that 
were  matched  to  the  STS  item  are  displayed  in  columns  2-4.    Note  that  in 
the  case  where  no  tasks  have  been  matched  with  an  STS  item,  the  values 
in  the  adjacent  columns  are  zero.    Also,  it  is  possible  to  show  the  same 
STS  items  and  OS  data  with  the  STS  items  arranged  in  descending  order  on 
field  supervisors'  recommended  training  emphasis.    This  display  gives  a 
powerful  overview  rf  the  data  and  is  expected  to  be  quite  useful  to 
training  managers. 

4.  The  remaining  two  pages  (pages  15-16)  provide  samples  of  detailed  OS 
data  in  the  STS  framework.    STS  items  are  printed  between  dashed  lines 
and  OS  tasks  (and  associated  data)  are  shown  immediately  below  the 
items.    Tasks  are  listed  in  order  of  training  priority  for  introductory 
airmen  within  each  STS  category.    The  matching  between  STS  and  OS  tasks 
was  performed  and  reviewed  by  course  personnel.    Note  that  tasks  may  be 
mapped  into  as  many  STS  items  as  required  and  that  both  STS  items  and  OS 
tasks  may  have  no  counterparts. 
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DATA  BASE  TO  DETERMINATION  OF  TRAINING  CONTENT: 


A  MANAGEABLE  SOLUTION 


Douglass  Davis 


The  Chief  of  Naval  Education  and  Training  (CNET) ,  Naval  Air  Station/ 
Pensacola,  Florida,  is  the  Chief  of  Naval  Operations  (CNO)  designee 
as  the  Navy's  principal  training  agent.    The  CNET  participates,  inter 
alia,  in  the  development  and  implementation  of  the  most  effective 
teaching  and  training  i^yr^tems  and  devices  for  optimal  education  and 
training.    This  paper  wiJLl  describe  in  some  detail  a  CNET  initiative 
which  is  bringing  into  a  manageable  focus  the  historical  problfsa  of 
determining  the  content  of  training  progreuns  within  the  agonizing 
limitations  of  existing,  and  even  diminishing,  resources. 

Although  a  framework  does  exist  for  determining  the  training  require- 
ments of  naval  personnel,  there  is  inconsistency  eunong  assigned  roles 
of  the  Navy's  three  "Training  Warfare  Desks"  (within  the  CNO).  This 
situation  undoubtedly  springs  from  the  fact  that  within  the  United 
States  Navy  there  are  three  distinct  communities:    air,  surface,  and 
submarine.     The  distinctions  are  so  prevalant  that  personnel  within 
the  employ  of  the  Navy  Department  often  speak  of  three  "separate 
Navies."    To  illustrate  the  reality  of  this  situation,  one  need  only 
refer  to  the  CNO  instruction  which  delineates  the  functions  of  the 
three  individual  Train>  ;  .  warfare  Desks:    OP -29,  OP-39,  and  OP-59, 
for  submarine,  surface,  -ind  aviation  manpower  and  training  requirements, 
respectively. 

A  function  of  the  Submarine  Manpower  and  Training  Requirements  Division 
is  the  identif  ica""  ion  and  "'^tablishment  of  training  concepts  and 
requirements;  the  corresponding  function  of  the  Surface  Warfare  Manpower 
and  Training  Requirements  Division  is  the  identification  of  require- 
ments and  the  establishment  of  priorities  for  assigned  training  programs. 
The  Aviation  Manpower  and  Training  Division,  however,  is  tasked  with 
developing  requirements  for  aviation  training  courses  of  instruction 
conducted  by  the  CNET  and  with  exercising  curriculum  control  and 
ensxiring  a  continuum  of  training  by  coordinating  the  integration  and 
standardization  of  flight,  aviation  ground  and  aviation  technical 
training  conducted  by  the  Chief  of  Naval  Education  and  Training 

The  clue  to  dealing  with  this  disparity  lies  perhaps  in  the  one 
comoion  function  2unong  the  three  Training  Warfare  Desks:  developing 
(or  establishing)  training  requirements .     The  vehicle  for  system- 
atically specifying  requirements  lies  in  the  surface  (OP-'39)  function 
of  establishing  priorities  for  assigned  training  programs.    It  is  the 
implementation  of  the  instructional  Systems  Development  (ISD)  process 
which  enables  the  CNET  amd  the  respective  Manpower  and  Training  . 
Warfare  Divisions  to  make  possible         development  (quantifiable  ^s^O^ 
statement)  of  requirements  and  the       aolishment  of  priorities  within 
stated  requirements »   
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An  early  product  of  the  ISD  process  is  a  Job  Task  Inventory  (JTI) 
or  list  of  tasks  which  school  or  course  graduates  may  reasonably 
be  expected  to  perform  in  their  fleet  (or  shore)  assignments.  It 
is  the  JTI  which  actually  serves  as  a  statement  of  training  require- 
ments and  gives  CNET  and  sponsors  at  the  CNO  echelon  a  data  base 
from  v^ich  to  negotiate  in  the  ultimate  determination  of  training 
content.    This  process  of  negotiation  of  training  requirements  has 
been  in  progress  since  early  March  of  this  year  (1978)  following 
the  critique  of  the  Pvadioman  (RM)  "A"  School  prepensed  curriculum 
validation  at  the  Service  School  Command,  San  Diego.    This  critique 
was  attended  by  representatives  from  the  RM  Technical  Advisor,  the 
CNO  rating  advisor,  and  the  Commanders  in  Chiefs,  Atlantic  and 
Pacific  Fleets. 

At  this  critique,  primarily  because  of  the  attendees^  inability  to 
agree  upon  ctirriculum  content,  the  concept  of  prioritization  of  JTI 
items  (Job  tasks)  was  introduced  by  CNET  representatives.     The  plan, 
which  has  been  recently  carried  out  to  completion,  involved  the 
forwarding  of  a  CNET-developed  JTI  to  CNC  for  subsequent  distribution 
to  the  Rating  Technical  Advisor  (CO^JNAVTELCOM)  ,  the  Commander  in 
Chief  Atlantic  Fleet  (CINCLANTFLT) ,  and  the  Commander  in  Chief 
Pacific  Fleet  (CINCPACFLT)  and  their  type  commanders  (air,  surface, 
and  submarine)  .    Ea';h  recipient  of  the  JTI  prioritized  the  list  of 
tasks  from  most  to  least  critic?!  and  forwarded  the  prioritized 
listing  up  the  chain  of  coirjmand  to  CNO.     In  early  October  1978, 
CNO  forwarded  the  consolidated  prioritized  list  of  tasks  to  CNET  as 
a  formal  statement  of  training  requirements  for  RK  "A"  School 
(apprentice)  trainees.    The  prioritization  contains  three  sections: 
Priority  A  -  Major  Tasks  identified  as  CRITICAL;  Priority  B  -  Tasks 
Identified  as  IMPORTANT ;  and  Priority  C  -  Tasks  identified  to  be 
included  if  practicable,  for  example: 

CATEGORY  A:  (C?JTICAL) 

Receive  top  secret  material 
Receive  secret  material 
Process  confidential  material 

CATEGORY  B:  (IMPORTANT) 

Update  crypto  center  files 

Perform  operator  maintenance  on  TSEC/KW-26 

CATEGORY  C:     (Include  if  practicable) 

Inventory  parts/tocls/supplies 

Upon  receipt  of  the  prioritized  JTI,  CNET  has  begun  to  study  the 
requirements  so  stated  in  order  to  determine  exactly  bow  far  down 
the  list  of  tasks  the  Naval  Education  and  Training  Command  can 
afford,  within  current  assets,  to  successfully  develop  training 
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programs  to  satisfy  fleet  and  OPNAV  expectations.    For  the  first 
time,  CNET  is  able  to  work  from  prioritized,  approved  lists  of 
requiraaents.    When  resotirces  have  been  exhausted,  CNET  ca.i  continue 
this  cooperative  endeavor  with  CNO  to  determine  the  placement  of 
tasks  which  cannot  be  trained  in  the  RM  "A"  School  within  the  bounds 
of  present  numbers  of  student  billets,  school  staff  billets, 
equipments,  and  OPN/OSMN  funding.    Of  course  the  CNO  will  have  the 
option  of  reallocating  resources  to  the  RM  "A"  School  or  of  supporting 
additional  resotarce  allocations  in  the  outyears.    Exerffising  this 
option  may  include  the  assignment  of  training  tasks  to  On-the-job 
Training  (OJT) ,  to  Self-Training  Exportable  Packages  (STEPs) ,  or  to 
Rate  Training  Manuals  and/or  Career  Correspondence  Courses. 

This  venttare,  emanating  from  the  data  base  created  by  application  of 
the  ISD  process,  enables  all  concerned  to  plan  career  training  and 
to  understand  the  rationale  which  determined  the  placement  of  training  in 
a    particular  setting  or  at  a  particular  point  in  the  career  of 
enlisted  RMs.    The  dilemma  of  ctarriculum  content  will  now  begin  to 
deminish  as  determination  of  curriculum  content  is  removed  from  those 
who  develop  curricula  and  is  placed  in  the  hands  of  those  who  have 
actually  been  charged  for  many  years  with  determining  training 
requirements . 

As  one  would  well  expect.  Fleet  recipients  of  course  graduates  find 
their  jobs  easier  when  course  graduates  have  been  trained  to  perform 
at  identified,  approved  high  levels  of  competency.     Ideally,  CNET 
would  ensure  that  graduates  meet  or  exceed  the  expectations  of  their 
supervisors,     it  is  these  expectations  and  the  limitations  of 
resources  and  student  billets  which  have  made  for  misunderstandings, 
illogically  derived  cotarse  content,  and  generally  uncomfortable 
feelings  among  the  Fleets  who  receive  graduates  and  the  command  (CNET) 
that  develops  and  administers  training  programs.    Admittedly,  there 
has  been  confusion  concerning  expected  and  actual  performance  of 
CNET  course  graduates.    This  situation  has  existed  for  several  years^^ 
primarily  because  of  CNET's  having  been  forced  to  remove  "extraneous 
material  from  courses  and  to  train  only  to  "need  to  know"  in  order 
to  shorten  courses  whenever  practicable  to  ease  the  impact  of 
decreasing  rescurces. 

A  follow-on  to  prioritization  of  training  requirements  and  development 
of  courses  which  reflect  this  prioritization  is  the  development  and 
refinement  of  a  concept  which,  once  implemented,  will  serve  to 
preclude  misunderstandings  on  the  part  of  active  users  of  course 
graduates.    A  Skills  Profile  (SP)  will  be  developed  for  the  purpose 
of  "profiling"  the  job  entry  level  for  RM  "A"  (and  ultimately  all 
other)  School  graduates.    The  SP  will  enumerate  the  skills  possessed 
by  graduates  and  will  be  made  available  tc  all  cognizant  activities 
via  the  Catalogue  of  Navy  Training  Courses   (CANTRAC)  microfiche 
medium.     Such  a  precise  statement  of  capabi-lities  to  which  a 
graduate  has  been  trained  will  provide  a  ^"^initive  baseline  against 
which  job  performance  can  be  evaluated  an    from  which  a  Fleet  feed- 
back system  and  a  training  readiness  inde     ::an  be  implemented. 
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The  cooperative  CNO-CNET  r    Drt  to  provide  a  data  base  and  the 
subsequent  prioritizatic:-        '^ra  ^Tri  ng  requirements  which  ±t  supports 
will  provide  CNO  sponsors      "  :  upon  which  to  base  decisions  (the 
"who,  what,  when  and  where"  of  tr/rsining)  ,  while  making  possible  the 
realization  of  the  .icvuai  C^T  rc^le:    applying  expertise  in  designing, 
developing,  implementing,  ^.nd  evaJbiating  (the  "how"  cf  training) 
courses  which  will,  more  -han  e\^  before,  meet  Fleec  requirements. 
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USING  THE  COMPUTER  TO  BUILD  THE  TASK  INVENTORY 


T.  Ansbro 

Career  Development  Group,  Naval  Education  and  Training 
Program  Development  Center,  Pensacola,  Florida 

At  the  "front  end"  of  Instructional  Systems  Development  (ISD) 
occupational  data  stockpiles,  especially  when  data  gathering  is 
piriiuslastically  and  thoroughly  pursued.     Sone  of  the  data  gathering 
firr  Training  Task  Analysis  in  the  Navy  has  so  far  produced  thousands 
of  tasks  per  rating,  and  there  is  no  evidence  to  suggest  a  change  in 
thje  trend.    As  the  data-gathering  techniques  necessarily  (and  unavoid- 
abJ.y)  become  more  sophisticated  and  complex,  the  chances  are  increased 
that  the  data  recorded  will  be  sufficiently  comprehensive  to  permit 
follow-on  analysis  to  perform  its  design  function  in  development  of 
training  curricula  and  materials,  certainly  throughout  and  hopefully 
far  beyond  any  initial  iteration  of  ISD.     Except  for  technological 
change  or  significant:  adjustments  in  manpower  management,  there  should 
be  little  need  for  more  than  a  periodic  augmentation  to  a  data  storage 
that  has  been  assembled  with  a  broad  compass  of  retrieval  strategies 
In  mind. 

One  key  to  the  projected  employment  of  these  occupational  data 
items  (tasks)  is  the  "signature  block"  of  each  task  recorded  during 
jab/task  analysis;  another  is  the  computer  programming  that  permits 
grouping  and  regrouping  of  recorded  tasks  into  arrays  and  hierarchies 
reflective  of  representative  equipment  items,  levels  of  work  sophisti- 
cation, established  or  innovative  occupational  structures,  or  internal 
task-descriOtive  hierarchies.     To  proceed  succTfessf ully  through  Train- 
ing Task  Analysis,  an  important  phase  of  ISD,  it  is  first  necessary  to 
provide  job  task  inventories  that  indicate  relationships  among  tasks, 
as  well  as  merely  list  them,  and  that  describe,  classify,  and  cata- 
logue.    Such  inventories  must  also  be  capable  of  rearrangement  of  tasks 
to  meet  specific  requirements  by  means  of  a  variety  of  retrieval  strate- 
gies.    These  inventories  can  be  built  in  and  by  the  computer,  task 
interrelationships  can  be  determined,  commonality  or  uniqueness  estab-- 
lished  and  measured,  and  degree  of  componency  and  index  of  complexity 
fixed.     The  initial  data  input  is  an  inventory,  to  be  sure;  but, 
except  to  serve  as  a  master  index  of  tasks  ascribed  to  a  rating,  it  is 
not  the  single  or  principal  such  instrument  employed  in  Training  Task 
Analysis  in  ISD. 

This  paper  will  treat  a  range  of  inventories  and  the  methodology 
used  to  produce  and  modify  them  by  describing  a  model  developed  and 
currently  in  experimental  use  by  the  Career  Development  Group,  a  unit 
of  the  Naval  Education  and  Training  Program  Development  Center,  Pensa- 
cola, Florida.    The  model  shoi^m  represents  one  attempt  to  secure  a 
massive  occupational  data  input  and  then  to  trim  it  dox^m  to  an  easily 
manageable  catalogue  from  which  to  select  items  for  the  follow-on 
steps  of  ISD. 
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The  task  inventory,  fully  explored  and  exploited,  is  more  than  a 
list  of  tasks  covering  work  done  within  a  rating;  although  the  Master 
Index  (figure  1,  sample  page)  is  just  that.     Inventories  for  use  in 
Front  End  Analysis  (FEA)  for  ISD  meet  training  task  analysis  require- 
ments other  than  those  of  indexing.     For  example,  inventories  can  be 
printed  out  by  equipment  hierarchies  (platform/ system,  equipsnent  item, 
component,  module;  figure  2),  or  by  established  "skill  levels"  (pay- 
grade  groupings),  or  in. divisions  or  sections  specialized  to  meet  other 
expressed  needs  of  the  ISD  process.     It  is  the  retrieval  strategy 
applied  to  a  multilayered,  detailed,  and  comprehensive  occupational 
data  input  that  make  the  varied  outputs  (invemrories  mentioned  above) 
capable  of  practical  employraent  in  such  further  steps  as  Trarning  Task 
Analysis  (TTA) . 

Principal  objectives  for  the  data  input  design  are  that  the  data 
be  detailed,  extensive,  and  reflective  less  of  a  technician's  opinion 
than  of  his  recognition  and  recall  of  characteristics  descriative  of 
tasks.     To  this  end,  tasks  to  be  recorded  in  FEA  are  fitted  loosely 
into  a  data  structure  that  becomes  progressively  more  finite  at  each 
lower  level  of  description.     This  data  structure  is  a  Navy  world-of- 
work  frame  of  descending  categories  of  tasks  in  what  eventually 
becomes  an  inventory.    The  major,  or  first,  divisions  (Major  Functional 
Categories)  indicate  the  broadest  clearly  distinct  categories  of  work 
that  tasks  fall  into,  irrespective  of  the  official  boundaries  of  a 
rating  under  analysis.    The  second  (next  lower)  are  the  Duty  Subcate- 
gories, work-descriptive  areas  of  smaller  compass,  within  which  are 
the  Task  Descriptive  Characteristics  and  the  Skill  Areas  (see  figure  3) . 

The  first  two  divisions  essentially  place  the  tasks  to  be  recorded; 
they  define  or  re-define  areas  of  task  population.    The  Descriptive 
Characteristics  and  Skill  Areas  provide  extensive  and  varied  items 
descriptive  of  task  actions  and  behaviors:  skill-related,  tool-and- 
equipment-oriented,  explicit  actions  in  a  graduated  format.     It  takes 
the  recording  of  many  of  these  characteristics  to  make  up  the  task 
"signature  block"  data  input  in  the  computer;  but,  in  the  aggregate, 
they  draw  the  picture  of  the  task  (the  signature  block  is  the  solid 
block  of  numbers  from  zero  (0)  to  three  (3)  below  the  statement  "Task 
Data  Worksheet  Information"  in  figure  1) . 

The  initial  recording  instrument  is  the  Job  Data  Worksheet  (JDW) 
(figure  5).     All  job/task  information,  with  the  exception  of  the  task 
signature  block  and  the  complexity  indey.  is  transferred  to  the  com-- 
puter  from  this  form  (compare  figures  1  and  5). 

The  second  date  recording  instrument  is  the  Task  Data  Worksheet 
(TDW)   (figure  6).     This  instrument  records  the  appropriate  Task  Des- 
criptive Characteristics  and  the  applicable  Skill  Areas,  all  of  which 
are  transferred  to  the  conputer,  reappearing  in  the  printout  as  the 
signature  block.     It  is  at  this  point  that  the  computer  actually  per- 
forms computation.    All  items  appearing  in  the  printout  (figure  1) 
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except  the  cornp.     xity  inde:-.  :  :^^ure  1,  "CompieMity  2.17")  represent 

merely  a  orinte: -oist  reseat'    -  task  data  iaput  or  idencif ication. 

The  compiirexity  •  ^xfex  is  a  jd™-  -n^iii  factor  with   i  range  of  zero  C?)  to 
five  (5)  ^r^ult   '.IV  from  con^-ici-r  manipulation  c     predetermined  virights 
for  the  -^Tsc've-inrr:-  «oned  emt    :es  in  the  task  signature  block. 

Metiw   -lous  nsL^rdi  z         iriese  data  provides  the  !lascer  l"rri---  or 
total  tassr:  :.nvsnt'rr^^,   :  rating  (or  any  other  identified  acc^nHirional 

group:     lEZ  ^tc.  na:zure  and  shape  of  other  laweaXoz^^Bs 

results  fr:  n  rr-^pl  -       i.  z  c    ^  variety  of  retrieva-  strategies  rr.  trie 
Master  Iniiszc.     Surr.  zjit-'IoxtAcs  as  that  sho\«m  in  "igure  2  (ecuimissat 
hienarchy    axe  esserEz:.^^       ''i>;uort:hand''  types  of  Sxsecialized  tzTvenroric  f 
the  :£quipTrs^"-c  "leve..,''  3  -s'  -.t-to-module) .     Tc  get  all  the  dat"!  r  ecordeii 
on  aziy  sp-     fic  tasn-      aa  has  only  to  track  the  task  nnmbe  r  baci 

from  the  3   -^lalicer        -7r:    iJi^r )  inventory  to  the  Master  Index. 

With  n  -r  more,   isophi     L     ^t*i  input  than  that  snown,  *:he  compusr  r 
screen  out   --^T  idr  "?.t  cal    zaiz^.  j    all  items  in  a  task  signature  b:lEii^ 
identical        all  -         i:   r^'      n  other  tasks).     Tas^:  ''similarity"  uigri^-d- 
upon  percent: ^ae  0     "  .de'  and  therefore,  "commonality,"  msy  be 

determined  -     the     :   :)u  er.     Using  the  derived  complexity  indices  arr.. 
a  program  a*  ignrci  ^    e-  -.or  e  task  interrelationships,  "componencv 
(the  degree        mi..^'.)  :ah  k  is   Included  in,   therefore,  "com.pone  . 

to"  another  J f  est  -bUsi.-c  '  higher  complexity)  may  also  be  determirr^ed 
(see  figure  7).     C.       onali*   ,  complexity,  and  componency  of  taskf  in 
any  invent^ — y  ar         i-irri-       by  tne  computer,  not  by  the  subject  rratter 
personnel  recu-.r.  rhe  _..  --a.     In  the  model  sho\>m  here,   the  abi  . -.ty  of 

the  comput-irr  to  "lo?:-.  at:  tasks  with  the  same  eye"  (for  cataLjguing 

tasks)  and  =n  do  ir   r-lia        and  vith  tireless  repetition  is  Tally 
exploited.     Summar     d_cis:     s,   formerly  the  province  of  the  subject 
matter  exr         (S2Li  ),  nave     '^^en  liierally  disassembled  into  nun^arous 
and  speciiic  iteTr^^  of  de     riptive  data  for  selection  and  appliLcation 
to  task -dat^  rec,   din-  b;    nr^e  SMEs,  recorded  by  them,   then  reassembled 
by  the  ccub^i  ter  i    :o  .such  d^^xision  patterns  as  those  mentionexi-  above 
(complex^:-/     et-  Few  ->EIs  can  match  the  computer's  abilit:  to 

compare-^  ^  ..ter   .avenuory        3,000  to  5,000  tasks  for  commona-.ity  in 
a  single        ep  'irr.  some  duty  subcategories  of  ratings,  sixty  Tpcrcent 
of  tasks    r     irdedi  proved  to  be  identical,   thereby  substantial;  -  re- 
ducing t:     _^2e  ^  f  the  inventory  x^7hile  not  affecting  its  comp:^s) . 
The  philr  — r-/  f:  llowed  in  developing  this  model  is  that,   unl t=ss  data 
gatheriisg.  .-irlni^'Ti  infiltration,  and  actual  analysis  are  recognized  as 
separate   juJmoug'.a  progressive)  steps  in  task  analysis,  and  this  sep- 
arateness  -    -laintainec,  opinion  infiltration  eventually  advances  in 
both  direct!   ns,  muddying  the  entire  effort.     It  is  difficult  trr  an 
SJffi  to  coopsT^  task  #2^5  with  //165,  having  previously  made  judg..  tients 
on  the  conrpastive  comrronality  of  //102  and  #86  without  .succumbitag  to 
the  halo  eEfa^  or  some  other  flowering  bias,   fatigue,  or  naggirtg; 
second  tho^ghrr^-     The  computer's  monolithic  programming  leaves  ir 
undisturber^  by   :hese  problems.     SME  opinion  is  employed  copiously^ 
where  indxTzLdua     technical  expertise,  recall  of  detail,  understai=ding 
of  systems,  broc^i  summary  judgements,  and  examination  and  verification 
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of  the  computer-made  decisions  ^r^n  refine  and  val:  ate  findings  and  rh^ 
products  of  analysis. 
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ERIC 


N0.****110    JOB  DATA  WORK  SHEET  INFORMATION 


AE-CX:-118 


ERIC 


RATING    TASK    PACKAGE    TASK-ACTION-CODE  DUTY-SUBCATEGORY 
AE  0118    0001         IFT  01 

ACTION  =  IFT  (ISOLATE  FAULT/TROUBLESHOOT)    ISOLATE  FAULT/TROUELESHOOT  WHEEL 

WELL  LIGHTS  44125 


PLATFORM    =  P3A/B 

EQUIPMENT  =  EXTERIOR  LIGHTING 


SYSTEM       =  LIGHTING 

COMPONENT  -  WNEEL  WELL  "LIGHTING 


CUES,  REFERENCES,  STANDARDS,  ETC.,  REFERRED  TO  BY  THIS  TASK. 

CUE  MALFUPICTION 

CUE  OPERATIONAL  CHECK 

STANDARD  lAW  REFERENCE  PUBLICATION 

REFERENCE  NA-01-75-PAA-1-12 

REFERENCE  NA-Ol-iA-505 

TOOLS  COMMON  HAND  TOOLS 

SUPPORT  EQUIP.. .POWER  UNIT  NC12/12A 
SUPPORT  EQUIP... AIR  CONDITIONER  NB-3A 
TEST  EQUIP  MULTIMETER  PSM-4 


TASK  DATA  WORK  SHEET  INFORMATION. 


COMPLEXITY  2.17 


GENERAL  31111000000000000000000000000 

DUTY  SUB  01  33333330000000000000000000000 

SKILL       1  21133000000000000000000000000 

SKILL       2  12110000000000000000000000000 

SKILL       4  23332320000000000000000000000 

SKILL       5  22332111300000000000000000000 


FIGURE  1 


SAMPLE  PAGE,  MASTER  INDEX  PRINTOUT 
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-4CTI0N  =  REMOVE  &  REPLACE  MOD  7352904 
^  P-3  SYSTEM       =  BOMB  NAV 

-  AN/ASN-42  COMPONENT  =  NOT  CODED 

=  NOT  CODED  COMPLEXITY  =  0.65 


AE-0010-0017-02-RAR-2 


ACTION  =  ALIGN  AN/ASN-42  NAV  CPTR  SET  73520 
=  P-3  SYSTEM       =  BOMB  NAV 

=  AN/ASN-42  NAV  CPTR  SET  COMPONENT   =  NOT  CODED 

=  NOT  CODED  COMPLEXITY  =2.31 

ACTION  =  ALIGN  CP-532/ASN-42  NAV  CPTR  7352400 
=  P-3  SYSTEM       =  BOMB  NAV 

=  AN/ASN-42  COMPONENT  =  CP-632 

=  NOT  CODED  COMPLEXITY  =1.61 

ACTION  =  REMOVE  &  REPLACE  4A32  PWR  AMP  ASSY  7352440 
=  P-3  SYSTEM       =  BOMB  NAV 

=  AN/ASN-42  COMPONENT  =  CP-632 

=  NOT  CODED  COMPLEXITY  =  0.51 


AE-010-0023-02-ALI-2 


AE-0010-0320-02-ALI-3 


AE-010-0319-02-RAR-3 


FIGURE  2 


SAMPLE  PAGE,  EQUIPMENT  HIERARCHY  PRINTOUT 
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MAJOR  nJNCriONAL  CATEGORY 


*  MAINTENANCE 
**  FABRICATION/PRODUCTION 
OPERATIONS 
PERSONNEL  SERVICES 
ADMINISTRATIVE  SERVICES 
INFORMATION  SERVICES  O-IEDIA) 
MILITARY 


DUTY  SUBCATEGORY 


*A.  CHECKING/TESTING/ INSPECTING 

B.  REPLACING/RESTORING  ITEMS 

C.  ADJUSTING/ALIGNING/CALIBRA- 

TING 

D.  REPLENISHING/LUBRICATING 

E.  CLEANING/PRESERVING 

**A.  Q-ECKING/TESTING/ INSPECTING 

B.  DESIGNING/PLANNING/LAYING- 

om 

C.  C0NSTRUCTING/ASSe<1BLING 

D.  EVALUATING/EARTHMOVING 

E.  DESTRUCTING/DISMANTLING 

F.  FINISHING/TRIf-MING/DECORATING 


FIGURE  3  DATA  STRUCTURE,  MAJOR  CATEGORIES 


GENERAL 


MAJOR  FUNCTIONAL  CATEGORY:  FABRICATION/PRODUCTION 


ACCESSIBILITY 


DUTY  SUBCATEGORY:  DESIGNING/PLANNING/LAYING  OUT 


(Concerns  getting  to  the  object  to  be  worked  on)  I    SPECIFICATIONS  AND  MEASUREMENTS 


a.  Easily  accessible;  of  little  consequence  in 
complexity  of  task. 

b.  Moderately  accessible,  e.g.,  requires  opening 
drawers,  removal  of  plates,  panels,  boots, 
covers,  or  minor  components;  climbing,  etc. 

c.  Difficult  to  gain  access,  e.g.,  requires 
disassembly  or  removal  of  other  components. 


a.  Specifications  provided;  only  static  measure- 
ments required. 

b.  Specifications  provided;  dynamic  measurements 
required. 

c.  Specifications  must  be  derived;  dynamic 
measurements  required. 


MAJOR  FUNCTIONAL  CATEGORY:  MAINTENANCE 


SKILL  AREA:  (5)  USING  TEST  EQUIPMENT 


DUTY  SUBCATEGORY:  REPLACING/RESTORING  ITEMS  OPERATION 


REMOVAL/REPLACEMENT 

a.  Simple  change  of  location-requires  no  fasten- 
ing/unfastening, e.g.,  lift,  push  aside,  etc. 

b.  Dual  action-rpouires  fastening/connecting/ 
unfastenina/:K^'-^:necting  in  addition  to 
change  of  ^k-V^-.. 

c.  Multiplv:     :        res  other  suppoiting 
actions  >       ^(/'  to  fastening/connecting/ 
unfastemng/uis?:;- ::'j^-jting  and  change  of 
location* 


a.  Built  into  system  or  requires  no  connection  to 
system  and  provides  automated  readings  after 
initial  set  up. 

b.  (1)  Built  into  system  or  requires  no  connections 

to  system  but  requires  manual  step-by-step 
procedure  to  obtain  readings,  or 

(2)  Must  be  connected  to  system  but  provides 
automated  readings. 

c.  Must  be  :onnected  to  system  and  requires  manual 
step-by-step  procedure  to  obtain  readings. 


ERIC  FIGURE  4        DATA  STRUCTURE,  CONTINUED 
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FIGURE  5 
ERIC 


JOB  DATA  WORKSHEET 


WTIN6  HT 


CUES 

1.  PMS,  quarterly 

REFERENCES 
1.  A-608/II117A.N 

SUPPORT  MATERIAL  / 
1.  Clean  raqs 

TEST  EQUIPMENT 
1.  Taoe  measure 

2a  Mhen  damaqed 

2.  NAVSEA  0901-LP.q?fl.(]M 

L  Paint  rpmnvpr 

Ohen  shiD  sustains  daiiiaq§ 

Chan.  qq?n 

3.  1"  paint  hnnh 

4.  HarriwnnH  hlnrlf 

L  Car:)pnt.pr'<;  rhaU  . 

fi.  Siimhnl  ?1Qn  TFP  nil 

7.  Mil  R.?.1«;4q  r,m« 

.8..k  )t  li!  stick  packing 

MIL  P-17578  ^vmhnl  14?<; 

T\mfl  T  nv*  MTl   r  CAOO    T»i**  TT 

 UffiL^OrJIlL-Ii-bOJZ.  TypP  11 

• 

.y.  o^uo  on  amfmnum  oxidp 

airasive  c  otn 

10.  Gasket  material 

 ;  - 

STANDARDS 
!•  I AH  reference 

TOOLS 

1.  Scraper 

• 

SUPPORT  EQUIPMENT 

1.  Flash  light 

2.  oil  can 

OTHER  CONDITIONS 

2.  Wire  brush 

3.  Allen  wrench  set 

3.  Grease  gun 

■  4.  12'  adjustable  wrench 

(21 

4.  Tnnl  hnx 

5,  8"  normal-dutv  screwdriver 

3.  Lye  protection  sniPio 
6.  BuckPt 

6.  Ball  peen  hammer 

7.  Hand  chispl 

7.  nxy  fiiPl  ruling  tnrrh 

8.  Drift  Dunch 

H.  wpming  marhinp 

9.  Weldina  hnnd  • 

q.  FlprtnV  grinHpr 

10.  Cutting  qnanle? 

in.  Snap  «:fnnp 

■ 

11.  FlprtrnrtP  hnlrlpr 

ILJilplding  plprtrnrfo^ 

T2.  ChiDDina  hammer 

1?.  <;park  ignitpr 

13.  Gasket  punch 

U.  Ipathor  glnvsc 

* 
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SYSTEMATIC  INSTRUCTIONAL  VALIDATION  THROUGH  TESTING 


Introduction 


Systematic  development,  implementation,  and  evaluation  of  instruc- 
tion has  gained  increasing  attention  as  the  aspects  of  accountability, 
efficiency,  and  effectiveness  of  education  or  training  have  received  more 
emphasis.     Instructional  systems  development  (ISD)  is  essentially  appli- 
cation of  a  systems  approach  to  educational  process.     Steps  in  this 
approach  are  basically  (1)  determining  instructional  needs,  (2)  developing 
effective  and  efficient  solutions  to  these  needs,   (3)  implementing  solu- 
tions, and  (4)  assessing  the  degree  to  which  these  needs  are  met. 

For  new  instructional  programs,  ISD  can  be  logically  and  effectively 
applied.    However,  for  existing  programs,  a  comprehensive  testing  plan 
will  provide  an  effective  alternative.    The  testing  plan  is  designed  so 
that,  if  ISD  is  later  applied  to  the  instruction,  the  methods  used  and 
data  collected  will  be  applicable  to  and  consistent  with  ISD.     The  pur- 
pose of  this  paper  is  to  present  a  technique  for  validating  instructional 
programs  through  course  testing  instruments  in  order  to  supplement  the 
development  process  used. 


Testing  integral  to  instruction 

Testing  serves  two  main  purposes  in  Navy  health  sciences  education 
and  training:     (1)  to  assess  student  knowledges  and  skills  acquired  while 
participating  in  training  activities;  and  (2)  to  assess  carryover  of  knowl- 
edges and  skills  to  real-life/actual  job  settings.     For  these  purposes, 
at  least  three  aspects  must  be  measured:     cognitions,  motor  skills, 
and  application  of  cognitions  and  skills  in  the  job  setting.    For  each  of 
these  aspects,  numerous  instructional  objectives  exist  for  a  given  course, 

.. .   „„  o.iKrjhQnf'o  ^n  thin  nthp.rulse  theoretical 

unese  oojecLxves  gxvxne  of^"--^^-^^      -  - 

distinction.     Test  items  are  designed  to  represent  and  conform  to  objec- 
tives and  the  methods  of  instruction. 

For  testing  to  assess  the  effect/success  of  instruction,  it  is 
essential  that  tests  measure  the  outcome  of  instruction  at  whatever  level 
of  detail  the  instruction  is  given.     The  key  to  determining  the  effective- 
ness of  instruction  is  the  precision  with  which  what  is  taught  is  tested. 
Test  items  luast  measure  s-pecific  behavior,  with  the  conditions  under 
which  the  behavior  is  to  be  achieved  and  the  manner  in  which  the  benavior 
and  conditions  are  to  be  demonstrated  established  by  the  objectives. 

The  number  of  written  test  items  and  performance  assessments  that  can 
be  generated  to  adequately  represent  all  the  instruction  conveyed  in  a 
particular  course  or  program  is  almost  always  more  than  can  practically 
be  administered  to  any  student.     Sampling  of  instructional  content  or  of 
testing  mechanisms  is  usually  done  to  reduce  the  amount  of  actual  testing 
to  a  proportion  of  the  total.     Selection  of  test  items  and  instruments  for 
use  at  any  one  time  can  be  done  by  random  or  stratified  sampling  procedures 
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or  a  combination  of  both,  the  objectives  of  the  instruction  will  usually 
guide  the  choice  of  sampling  procedures.    Whatever  the  selection  process 
used,  all  testing  m€2hanisms  need  to  be  validated  beyond  the  face  and 
content  validity  built  in  during  development.     Typically,  this  validation 
takes  the  form  of  concurrent  or  predictive  validity  studies  where  appro- 
priate criterion  measures  are  available  or  developed,  against  which  the 
new  tests  are  compared. 


Field  validation  t** 

A  different  approach  to  validation,  however,  is  more  appropriate  for 
specialized  training  particularly  when  ISD  has  not  been  used  in  program 
development.     This  approach  is  closely  linked  to  the  second  purpose  of 
testing:     to  assess  carryover  of  knowledges  and  skills  to  the  real-life/ 
actual  job  setting.     Validation  through  testing  can  be  accomplished  only 
if  direct  input  is  obtained  from  appropriate  "field"  specialists  or  prac- 
titioners.    In  traditional  curricular  development,  tests  are  devised  to 
correspond  to  instruction.     It  is  essential,  however,  to  extend  validation 
by  determining  the  extent  to  which  instructional  content  and  tests  corres- 
pond to  job  requirements. 

The  process  by  which  tests  can  be  validated  against  job  requirements 
can  be  applied  to  any  type  of  testing  mechanism.     Two  specific  examples 
will  be  given  here,  one  of  written  test  items  and  one  a  performance  rating 
scale  which  are  part  of  a  16-week  (640-hour)  course  for  otolaryngology 
(ENT)  technicians.     This  course  consists  of  five  content  units:  anatomy 
and  physiology,  ENT  surgery,  clinic  technique,  operating  room  procedures, 
and  audiology.     The  expressed  purpose  of  the  course  is  to  "provide  trained 
enlisted  personnel  with  the  knowledges  and  skills  needed  to  assist  medical 
officers  in  the  treatment  and  care  of  patients  with  otolaryngology  disorders* 
(Catalog  of  Navy  Training  Courses,  June,  1978,  Vol.  2). 

Validation  of  written  test  items 

For  field  validation  of  test  items  for  this  course,  a  sample  of  Oto- 
laryngologists (ear,  nose,  and  throat  (ENT)  specialists)  was  chosen  based 
on  the  following  criteria:     (1)  the  physicians  were  on  a  hospital  staff 
(Navy  Regional  Medical  Center);   (2)  three  or  more  ENT  technicians  were 
assigned  to  assist  the  nhyf=;icians  in  the  clinic  and  operating  rooms  of  the 
hospital;  and  (3)  the  physicians  were  the  immediate  supervisors  of  one  or 
more  ENT  technicians. 

The  physicians  were  directed  to  judge  how  important  information  con- 
tained in  each  test  item  was  for  the  technician's  performance  of  his 
clinic  and  operating  room  duties.     To  obtain  these  judgements  in  a  system- 
atic way,  test  items  were  presented  in  a  rating  scale  format — each  item 
preceded  by  five  response  columns. 

The  statement  to  which  the  physician  responded  was:     "The  item  tests 
information  that  is  essential  to  the  technician's  job  performance." 
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Judgement  of  the  importance  of  the  content  of  the  test  question  was 
expressed  bv  indicating  how  much  he  agrees  or  disagrees  with  this  state- 
ment     The  f  ve  columns  located  to  the  left  of  each  item  were  labeled  as: 
SA  (StroQgiT-  Agree) ,  A  (Agree) ,  U  (Undecided) ,  D  (Disagree) ,  and  SD 
"(Strongly  Disagree).     The  specialist  was  instructed  to  mark  an    X    in  the 
appropriate  rolumn  to  represent  his  judgement  about  the  essentialne--^  of 
the  technicaiin  knowing  the  item's  content.     (See  Tables  1.1  and  1.^  ^- 
the  end  of  ^h  section  of  the  test—there  were  five  sections  corre 
ing  to  the  five  content  units  of  the  course—was  a  Comments  page  or 
the  physician  could  note  topics  that  were  not  included  but  which  s 
tested. 

Responses  to  the  rating  scale  were  received  from  8  of  the  9  R  ,  al 
Medical  Centers.     A  frequency  tally  was  done  of  the  ratings  given  each 
test  item.    An  item  for  which  more  than  half  of  the  ratings  fell  below 
the  midpoint,  i.e.,  five  or  more  responses  were  in  the  columns  of  Dis- 
agree"  and  "Strongly  Disagree,"  was  considered  to  be  judged  non-essential. 
Of  the  original  200  test  items  submitted  for  review  by  the  Otolaryngol- 
ogists, 45  were  judged  to  test  non-essential  information. 

The  remaining  155  items  were  then  revised  as  recommended  by  the 
reviewers  and  submitted  to  the  ENT  technician  instructional  staff  for 
review,  revision,  and  additions.     Instructors  were  requested  to  perform 
two  functions:     (1)  verify  that  the  remaining  test  items  correspond  to 
instructl^^  objectives,  or  revise  one  or  the  other  so  that  they  do 
correspoij,  ant^  (2)  propose  additional  test  items  to  measure  objectives 
not  represented  by  the  remaining  test  items,  indicating  the  objective 
being  measdnred.     The  revised  and  new  test  items  are  then  submitted  for 
field  vai^iation  in  the  manner  described  above,  the  process  being  an 
iterative  one. 

Validation  of  performance  rating  scale 

Since  a  large  portion  of  ENT  technician  instruction  consists  of  skill 
development,  assessing  the  level  at  which  these  skills  are  performed  in 
the  real-life/actual  job  setting  is  the  most  ^PPy°P^i^'^f ,f  Jhe  SZil 

the  adequacy  of  instruction  for  these  skills.     Field  validation  of  the  skill 
for  which  ENT  technicians  were  trained  was  initiated  through  a  training 
follow-up  or  feedback  instrument.    Because  of  the  diversity  of  tasks  for 
which  the  technician  is  trained-this  primarily  due  to  his  assisting  in 
either  or  both  clinic  and  operating  room  setting-two  forms  were  developed, 
one  related  to  clinic  tasks  and  one  for  tasks  performed  in  the  °Perating 
room.     Initial  review  and  refinement  of  task  statements  was  accomplished 
in  cooperation  with  an  ENT  physician  and  a  senior  ENT  technician. 

Within  30  days  of  completion  of  instruction,  training  follow-up  forms 
were  sent  to  the  duty  station  of  the  graduates  of  the  school.  Otolaryngol- 
ogists supervising  the  recent  graduates  were  requested  in  a  cover  letter  to 
complete  the  forms  for  purposes  of  assisting  "in  determining  the  relevance 
of  the  Otolaryngology  Technician  training  curriculum. 

Each  of  the  two  training  follow-up  forms  consisted  of  a  list  of  tasks 
for  which  the  technician  is  trained,  each  task  followed  by  three  response 
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columns.  Specific  instructions  to  the  physician  completing  the  forni 
included : 


Attached  is  a  list  of  tasks  an  ENT  technician  may  be  require! 
to  do  in  the  clinic  (or  operating  room).     If  the  specified 
technician  is  assigned  for  the  equivalent  of  one  day  or  more 
per  week  to  the  clinic  (or  operating  room) ,  this  Clinic  (or 
Operating  Room)  Assignment  form  should  be  completed  for  him/ 
her.   .   .  . 

In  the  Columns  numbered  I,  II,  and  III  following  each  task, 
indicate  specific  information  about  the  technician's  perform- 
ance of  that  task. 

Column  I:        "Does  the  technician  perform  the  task?"  Mark 
an  "X"  under  either  "YES"  or  "NO",  whichever 
is  appropriate. 

Column  II;      Use  this  column  only  if  the  technician  performs 
the  task  (if  you  marked  an  "X"  under  "YES"  in 
Column  I) . 

"How  well  does  the  technician  do  the  task?"  Mark 
an  "X"  in  the  block  under  the  term  that  best 
describes  the  quality  of  this  performance,  namely, 
EXCELLENTLY,  ADEQUATELY,  or  INADEQUATELY. 

Column  III:     Use  this  column  only  if  the  technician  does  not 

perform  the  task  (if  you  marked  an  "X"  under  "NO" 
in  Column  I) . 

"What  is  the  reason  that  the  technician  does  not 
perform  the  task*:  "    Mark  an  "X"  to  indicate  which 
of  the  following  reasons  is  appropriate: 

1.  The  technician  says  he/she  wasn't  taught  how 
to  do  it. 

2.  The  technician  doesn't  know  how  to  do  it. 

3.  Operating  room  procedures,  or  your  way  of 
practice,  does  not  require  the  technician 
to  do  the  task. 

Allowing  for  incompleteness  in  the  list  of  tasks,  the  specialist  was 
also  requested  to  supply  in  the  space  provided  under  Additional  Duties, 
those  tasks  that  the  technician  does  which  were  not  included  in  the  list. 
A  Comments  section  was  also  provided,  with  the  specific  request  that  the 
physician  give  general  suggestions  he  may  have  regarding  the  follow-up 
itself.    Depending  on  the  actual  work  assignments  of  the  technician, 
either  the  clinic  or  operating  room  form  (or  both)  was/were  completed  for 
each  graduate. 


Sunnnary  of  performance  data 


The  initial  group  of  recent  graduates  whose  performance  was  assessed 
consisted  of  lA  ENT  technicians.    For  this  group,  responses  for  12  were 
received.    A  frequency  tally  was  done  of  the  responses  to  the  three 
questions  asked:     if  the  technician  performs  the  task,  how  well  he  performs, 
or  why  he  doesn't  perform  it.     Totaling  the  responses  initially  provided 
the  following  data: 

(1)  the  number  of  recently  graduated  ENT  technicians  who  perform 
and  do  not  perform  each  listed  task; 

(2)  the  number  who  are  judged  to  perform  each  task  at  the  three 
specified  levels  of  competence;  r.nd 

(3)  the  number  of  technicians  who  do  not  perform  the  tasks  for 
each  of  various  reasons. 

Summary  descriptive  statistics  were  then  calculated  for  each  task,  pro- 
viding the  following  further  data: 

(1)  the  proportion  (or  percentage)  who  perform  and  do  not  perform 
each  task; 

(2)  the  average  (median)  competence  rating  given  for  those  who 
perform  each  task  (for  purposes  of  calculation,  a  rating  of 
Excellent  was  converted  to  3,  Adequately  to  2,  and  Inadequately 
to  1) ;  and 

(3)  an  index  of  variability  in  ratings  (semi-interquartile  range) 
using  the  same  numerical  conversions. 

Additional  tasks  supplied  by  the  physicians  were  summarized  in  the  same  way. 


Application  of  data 

The  actual  number  of  responses  from  otolaryngologists  to  the  field 
survey  of  the  essentialness  of  test  item  content  and  the  performance  of 
recent  graduates  is  insufficient  to  warrant  extensive  curricular  revision. 
The  process,  however,  is  being  repeated  for  additional  tests  and  subsequent 
graduating  classes  to  substantiate  trends  and  to  clarify  topics  and  tasks 
for  which  responses  varied  greatly.    The  manner  in  which  these  types  of  date 
can  be  utilized  for  validation  and  revision  of  tests  and  instruction  are 
straightforward,  however. 


Test  validation  and  revision 

1.      Written  test  items  judged  to  contain  non-essential  information  are 
eliminated  from  the  usable  itetn  pool.     2.     Test  items  that  are  judged  to 
contain  inaccuracies,  based  on  comments  of  specialty  practitioner are 
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revised  accordingly  and  validated,  as  are  new  items.     3.     Similarly,  tasks 
performed  by  those  completing  instruction  constitute  the  list  of  tasks 
for  which  others  should  be  trained,  and,  therefore,  performance  of  those 
tasks  is  what  is  testable.     A.    A  pool  of  field  validated  items  and  tasks 
is  maintained  from  which  tests  for  specific  purposes  and  according  to 
specified  parameters  can 'be  drawn.     5.     Periodic  re-validation  of  testing 
instruments  will  be  implemented  so  that  changes  in  knowledge  and  technology 
can  be  represented  and  incorporated. 


Instructional  validation  and  revision 

1.  Those  areas  of  content  judged  essential  and  the  skills  reported  as 
functional  by  field  practitioners  form  the  basis  for  instruction. 

2.  That  content  judged  non-essential  and  tasks  not  performed  are 
removed  from  instruction  (unless  emergency  or  contingency  consideration 
require  its  being  retained). 

3.  Recommendations  for  additions  or  deletions  to  instruction  or  testing 
are  compared  with  data  from  field  practitioners.     If  validation  data  is 
not  available,  it  is  collected  using  one  of  the  previously  described 
procedures . 


Conclusions 

While  procedures  for  revising  instruction  and  testing  are  often 
organizationally  specific  and  tied  to  considerations  not  at  all  a  part  of 
educational  process,  the  need  to  firmly  base  such  revisions  on  the  real- 
world  considerations  is  almost  too  obvious  to  mention.     The  all-too-common 
and  cyclic  process  of  instructor  determining  what  should  be  instructed, 
most  often  with  real  sincerity,  believing  he  is  the  best  judge  of  what 
should  be  taught  because  he  has  been  teaching  it  for  N  years,  needs  to 
become  instead  an  interactive  process.     Obtaining  and  incorporating  "field" 
data  into  instruction  and  instructional  development  is  essential  and  effi- 
cient if  your  goal  is  validity. 
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What  frequency  span  is  used  for  the  short  increment 
sensitivity  index? 

a.  4K,  2K,  IK,  500Hz,  250Hz 

b.  6K,  4K,  3K,  500Hz,  250Hz 

c.  6K,  4K,  3K,  2K,  500Hz,  250Hz 

d.  6K,  4K,  2K,  IK,  500Hz,  250Hz 


Nonsyllabic,  phonetically  balanced,  and  equally  difficult 
words  are  characteristics  of  the  test. 


a.  short  increment  sensitivity  index 

b.  Stenger 

c.  speech  reception  threshold 

d.  speech  discriminatipn 


I 

What  is  the  most  efficient  type  of  masking  noise  for  pure 
tones?  ,1 

a.  speech 

b.  sawtooth 

c .  white 

d.  narrow  band 


Table  LI 

^  A. 


AUDIOLOGY 


rhe  item  tests  in- 
formation essential 
to  job  performance. 

SA 

A 

U 

D 

SD 

For  items  1-4  select, from  column  B  the  term  which  best  fits  the 
definition  in  Column  A. 

Column  A 

Column  B 

1. 

<^  a  device  designed  to  determine 
the  quantity  of  hearing 

a. 
b. 

air  conduction 
sensorineural  hearing 
loss 

2. 

^   transmission  of  sound  stimuli 
to  the  eardrum  via  the  external 
ear  canal 

c. 
d. 
e. 

bone  conduction 
audiometer  , 
conductive  hearing  ; 
loss 

3. 

^   hearing  loss  caused  by  decreased 
sensitivity  of  the  end  organ  of 
hearing 

4. 

c   transmission  of  sound  vibrations 
to  the  inner  ear  via  the  bones 
of  the  skull 

5. 

What  examination  determines  the  ability  of  a  patient  to  under- 
stand what  he  hears? 

a.  speech  discrimination 

b.  speech  reception  threshold 

c.  short  increment  sensitivity  index 

d.  Stenger  test 

V 

0  X 

Table  1.2 


"         OTOLARYNGOLOGY  (ENT)  TECHNIClAp|446)  TRAINING  FOLLOW-UPl 


O.R.  ASSIGNMENT 

ENT  Technician 

Station 

Rater 

I  ^ 

Pcpforins 
tasi; 

II 

•  If  Yes. 

how  vjiin 

m 

If  Jo, 
9^5  rp"«JiOn 
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SCHEDULING  FORMAL  SCHOOL  TRAINING 
TO  MAXIMIZE  COST  EFFECTIVENESS 

DOUG  GOODGAME 
OCCUPATIONAL  RESEARCH  PROGRAM 
TEXAS  A&M  UNIVERSITY 


ABSTRACT 

Procedures  for  designing  Instructional  systems  which  rely  upon  the  job 
Inventory  method  to  collect  occupational  data  from  incumbent  workers  and 
job  supervisors y  can.  In  the  data  analysis  phase,  provide  the  designer 
Information  for  making  decisions  on  cost  effective  scheduling  of  formal 
school  training.  Ts^o  situations  are  presented  to  substantiate  this  assertion. 
One  situation  describes  correlational  relationships  between  task  factor 
ratings  (measuring  work  requirements  at  the  job  site)  which  dictate  that 
formal  school  training  should  be  scheduled  prior  to  job  assignment.  The 
second  situation  reveals  relationships  whereby  formal  school  training  may 
be  delayed  Indefinitely. 

The  results  of  three  occupational  studies  are  reviewed  to.  demonstrate 
sample  applications.  The  studies  reveal  that  uniform  relationships  do  not 
exist  across  work  requirements  In  similar  occupations  and  Indicate  that 
unique  conditions  In  the  work  environment  affect  relationships  between 
work  requirements. 
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I  INTRODUCTION  AND 
I  STATEMENT  OF  PROBLEM 

The  purpose  of  this  article  Is  to  demonstrate  a  method  whereby  Instruc- 
tional system  designers  can  determine  If  formal  school  training  should  be 
scheduled  prior  to  job  assignment.  In  the  event  formal  school  training  can 
be  delayed  to  a  period  after  job  assignment  the  designers  will  then  be  able 
to  develop  less  costly  forms  of  training  and  Implement  the  training  during 
an  on- job-training  phase.  Cost  of  training  Is  often  related  to  the  location 
where  training  is  delivered.  These  locations  can  include,  but  are  not  neces- 
sarily limited  to  the  following; 

On  The  Job:    Training  experiences  are  directly  keyed  to  job  actions  and 
easy  to  learn  job  practices,  and  procedures.  Supervisors  and 
senior  workers  control  the  content,  pace  and  sequence  of  instruction. 

Agency  Classroom:    Training  experiences  are  often  keyed  to  policy,  pro- 
cedures and  specialized  job  functions  of  the  employing  organization. 
Management  and  staff  from  the  employing  agency  control  content, 
pace,  and  sequence  of  instruction. 

Remote  Classroom:     Training  experiences  are  directed  to  those  work 

behaviors  and  technologies  most  difficult  to  learn.  Training  at 
this  location  (referred  to  as  formal  school)  represents  a  pooling 
or  sharing  of  training  resources  where  instructional  specialists 
control  the  content,  pace  and  sequence  of  instruction. 

The  most  costly  training  occurs  in  the  formal  school  setting  at 
the  remote  classroom.     Training  costs  at  this  location  are  the  cumulative 
I  result  of  trainees  loss  of  production  to  attend  the  school  or  cost  to 
replace  trainee  with  worker  of  compar?itive  ability.  Additional  costs 
Include  trainees  travel  and  per  diem  plus  the  cost  to  support  instructional 
resources  at  the  remote  location.  Many  of  these  costs  can  be  minimized  if 
initial  training  can  be  delivered  on-the-job  or  in  the  agency  classroom 
reserving  formal  school  training  to  a  later  more  cost  convenient  period. 
Appropriate  on-job  and  agency  classroom  training  can  also  reduce  time 
and  cost  to  administer  formal  school  training  by  addressinisi  skilli^  and 
knowledges  that  are  readily  learned  in  those  training  environments. 
In  addition,  work  experience  at  the  job  site  can  provide  valuable  learning 
experiences  and  develop  a  foundation  and  frame  of  reference  for  formal 
school  training. 

It  is  not  always  possible  to  delay  formal  school  training  to  n  later, 
more  cost  convenient  period  in  a  trainees  work  experience.  In  many  situations 
work  requirements  at  the  job  site  necessitate  the  aquisltlon  pf  critical 
knowledge  and  skills  before  a  worker  can  function  productively  in  the 
assigned  work  environment. 

Determining  if  formal  school  training  cnn  he  delayed  without  violating 
critical  work  requirements  at  the  job  site  Is  the  central  problem  to  be 
addressed  by  this  article.  The  solution  to  this  problem  requires  an  analysis 
of  occupational  data  measuring  work  requirements  of  tasks  performed  by 
Incumbent  workers  at  the  job  site. 


287 


31 8 


II  REVIEW  OF  RELEVANT  LITERATURE 


Considerable  research  has  been  conducted  to  develop  appropriate  method- 
ologies for  designing  Instructional  systems  to  solve  training  problems  (1) 
(8)!  TO  d"e.  thfeffort  has  concentrated  on  job-task  analysis  techniques 
and  procedures  for  translating  results  of  task  analysis  into  curriculum. 
?Sese  activities  mark  the  beginning  points  for  instructional  system  design. 
Designers  of ten  assume  that  the  end  product  will  be  delivered  in  the  most 
cost  effective  manner  on  a  schedule  consistent  with  work  requirements  at 
the  job  sJte.  TOO  often  well  designed  instructional  systems  are  delivered 
on  a  schedule  consistent  with  work  requirements.  This  is  ""f '^tunate  for 
designers  are  now  beginning  to  collect  the  types  of  occupational  data 
which  make  such  determinations  possible. 

An  investigation  to  determine  if  formal  school  training  should  be 
scheduled  prior  to  job  assignment  can  be  a  by-product  of  standard  proced- 
ures for  conducting  job  analytic  studies  in  an  occupational  area.  There 
is  littL  additional  work  required  of  a  designer  of  instruction  systems 
provided  the  designer  follows  recommended  procedures  and  J* 
analyses,  specific  types  of  occupational  data  using  job  or  task  inventories 

III  I-IETHODOLOGY 


Data  Kequlrements 


Numerous  organizations  presently  follow  recommended  P'^^f '^""^  .^^  . 
constructing  job'or  task  inventories  which  enable  large  -^-"'Pl^^  ' 
workers  in  an  occupational  field  to  report  performance  and  non-performance 
of  tasks  across  their  job  domain.  A  job  or  task  i"7"«^°7'  ^.^^^^^^'^^^ 
developed,  will  contain  a  listing  of  all  tasks  performed  by  in^"°bent 
workers  in  a  specific  job  domain.  Each  Incumbent  can  then  use  the  task 
Inventory  to  report  the  unique  set  of  tasks  performed  at  the  job  site. 
Task  level  job  descriptions  can  be  computed  for  a  group  of  Incumbent 
workers  to  report  the  percentage  of  workers  performing  each  task. 

The  percentage  performing  value  is  a  vital  measure  of  ^"'f^^^^  °^  .^^"'^ 
performance  at  the  job  site  and  identifies  what  workers  do  and  do  not  do 
as  th^  routinely  perform  their  work  assignments.  Resultant  values  produce 
Tda^fvec^or  acLss  all  tasks  in  the  job  domain  with  values  '^-g^rf/-'" 
0%  to  100%.  This  data  vector  (percentage  of  members  P"f°™J"|_!;^^^Jhi8 
task  factor  will  be  referred  to  in  an  abbreviated  form  as    PERP    in  this 

article. 

Job  analytic  studies,  conducted  to  design  instructional  systems,  also 
require  that  certain  types  of  occupational  data  be  collected  from  experienced 
lorsupervisors  to  define  critical  work  and  training  requirements  of  tasks. 
TO  accomplish  this,  job  supervisors  review  each  task  and  report  ratings 
(using  specially  designed  Likert  scales)  on  each  of  the  following  task 
factors: 

-Task  learning  difficulty  (TLD) :  time  required  to  learn  to  perform  a 
^ask  satisfactorily,   (low  scale  values  equal  short  learning  periods) 

-^ask  deUy  tolerance  (TDT) :  delay  time  tolerated  prior  to  beginning 
perforPiance  of  a  task  once  Incumbent  observes  that  task  must  be 
performed,  (low  scale  values  equal  short  delay  times) 
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-Consequences  of  performance  failure  (CPF) :  severity  of  consequences 
of  inadequate  performance  of  task,  (low  scale  values  equal  inconse- 
quential results) 

Inter-rater  reliability  coefficients  should  be  computed  on  ratings 
from  each  factor  to  identify  and  delete  unreliable  raters  from  the 
investigation  (6).  Resultant  means  provide  a  measure  of  the  work  require- 
ments for  each  task  on  each  task  factor  and  establishes  a  data  vector 
for  each  factor. 

Recent  studies  have  demonstrated  that  data  vectors  for  each  of  the 
four  task  factors  presented  in  this  article  (PERP,  TLD,  TDT,  and  CPF) 
can  account  for  80  to  90  percent  of  the  variance  in  a  criterion  data 
vector  representing  reliable    ratings  on  training  priority  of  tasks 
(3)  W  (5).     It  is  evident  that  a  task's  estimated  priority  for  training 
is  a  function  of:  a)  emphasis  on  task  performance  at  job  site,  b)  task 
delay  tolerance,  c)  task  learning  difficulty  and  d)  consequences  of  task 
performance  failure . 

The  associative  variations  among  these  task  factors  (factor  vectors) 
can  present  very  intriguing  glimpses  into  the  work  requirements  for  an 
occupation^  These  variations,  in  a  correlational  framework,  can  allow 
designers  of  instructional  systems  to  determine  if  delay  can  be  tolerated 
in  delivering  formal  school  training.  In  this  regard,  the  next  section 
presents  two  examples:  the  first  identifies  certain  relationships  among 
work  requirements  that  necessitate  delivery  of  formal  school  training 
prior  to  job  assignment,  and  the  second  identifies  an  opposite  set  of 
relationships  indicating  that  formal  school  training  can  be  delayed 
indefinitely. 

IV  PROCEDURES  FOR  ANALYSING  DATA 

The  first  step  in  analysing  the  work  requirements  of  an  occupation 
relative  to  the  four  task  factors  requires  computing  and  reporting  a 
correlation  matrix.  The  matrix  reports  the  Pearson  product-moment  correla- 
tion coefficient  between  each  factor  vector  and  all  other  factor  vectors 
in  an  occupational  study. 

The  correlates  between  factor  vectors  in  an  occupational  study  can 
reveal  to  the  designer  relationships  between  work  requirements  at  the  job 
site,  which,  in  turn,  can  help  the  designer  determine  whether  delay  in 
formal  school  training  would  seriously  violate  work  requirements  of  tasks 
routinely  performed  at  the  job  site. 

The  following  is  being  presented  as  an  example  of  a  situation  where 
formal  school  training,  at  the  remote  location,  should  be  scheduled  prior 
to  job  assignment.  The  correlates  in  a  model  matrix  should  indicate  that: 

1.  PER?,  is  negatively  correlated  with  TDT:  This  implies  that  for  tasks 
performed  by  a  majority  of  workers,  the  workers  have  little  delay 
time  in  initiating  performance  of  the  tasks  once  the  workers  observe 
that  the  task  has  to  be  performed.  It  also  implies  that  workers  may 
not  have  time  to  consult  a  supervisor  or  senior  worker  or  look  up 
a  procedure  in  a  manual  before  initiating  performance  of  the  task. 


2.  PERP.  Is  positively  correlated  with  TLD:  This  Implies  that  tasks 
performed  by  a  majority  of  workers  are  difficult  to  learn  to  perform. 
Difficulty  being  expressed  as  time  required  to  learn  to  perform  a 
task  satisfactorily. 

3.  PEBP.  is  positively  correlated  with  CPF:  This  Implies  that  tasks 
performed  by  a  majority  of  the  incumbent  workers  will  result  in. 
severe  consequences  if  not  performed  correctly. 

A.  TDT  is  negatively  correlated  with  TLD:  This  correlation  indicates 
that  tasks  with  low  time  delay  tolerances  require  longer  periods 
of  learning  time. 

5.  TDT  is  negatively  correlated  with  CPF:  This  implies  that  tasks  with 
low  time  delay  tolerances  will  result  in  severe  consequences  if 
performance  failure  occurs. 

6.  TLD  is  positively  correlated  with  CPF:  This  correlation  Implies 
that  tasks  which  are  difficult  to  learn  to  perform  correctly  will 
result  in  severe  consequences  in  the  event  of  performance  failure. 

Correlates  of  high  magnitude  in  the  above  example  would  apply  to  few 
jobs  in  our  work  society.  It  is  highly  probable  the  correlates  would  apply 
to  tasks  performed  by  emergency  medical  service  personnel  and  firefighters 
to  name  two  occupations  where  the  job  demands  are  exceedingly  rigorous  with 
task  performance  constrained  by  low  time  delays.  It  is  conceivable  that  an 
analysis  of  occupational  data  from  these  two  areas  would  indicate  that  formal 
school  training  should  occur  prior  to  job  assignment. 

A  reverse  in  the  signs  associated  with  the  correlates  between  factor 
vectors  presented  in  the  model  matrix  will  establish  the  boundaries  for  a 
second  matrix.  This  second  model  would  Indicate  a  high  probability  that 
formal  school  training  could  be  delayed  indefinitely.  Such  a  reverse  implies 
that  a  majority  of  the  tasks  performed  by  workers  will  exhibit  high  task 
delay  tolerance  values,  be  easy  to  learn  to  perform,  and  produce  inconse- 
quential results  if  performance  failure  occurs.  In  addition,  tasks  with  low 
time  delay  tolerances  will  be  easy  to  learn  to  perform  and  will  produce 
Inconsequential  results  if  performance  failure  occurs.  Also,  tasks  that  are 
difficult  to  learn  to  perform  will  produce  results  in  which  performance 
failures  will  be  inconsequential. 

The  two  correlational  models  presented  in  this  section  represent 
situations  in  which  occupational  work  requirements  dictate  two  extremes. 
The  first  model  implies  that  formal  school  training  should  be  scheduled 
for  new  employees  prior  to  job  assignment,  since  an  analysis  of  work 
requirements  indicates  that  a  new  employee  would  have  difficulty  perform- 
ing assigned  tasks  without  special  training.  The  second  model  implies  that 
formal  school  training  could  be  delayed  indefinitely,  since  the  analysis  of 
work  requirements  indicates  a  high  probability  that  a  new  employee  would 
not  have  any  difficulty  learning  to  perform  assigned  tasks  at  the  job  site. 

In  the  next  section  correlates  between  factor  vectors  generated  from 
three  occupational  fields  are  reviewed  to  demonstrate  field  application 
of  the  process. 
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V  THREE  SAMPLE  APPLICATIONS 

The  Occupational  Research  Program  ar.  Texas  A&M  University  recently 
conducted  job  analytic  studies  In  three  criminal  justice  occupations  to 
derive  training  requirements  for  designing  instructional  systems.  In  one 
•  tudy  295  taiki  fxtowi  by  258  county  detention  officers  were  analyzed 
(5).  Tasks  psrf«c»sd  in  county  detention  centers  are  closely  related  to 
tasks  performed  by  correctional  officers  in  state  and  federal  correctional 
institutions.  Generally,  county  detention  officers  process  prisoners  into 
the  center,  supervise  the  custody  of  Inmates  housed  in  cell  blocks  and 
process  prisoners  for  release  from  custody. 

A  second  study  Investigated  the  work  performed  by  121  sheriffs' 
deputies  (4).  A  portion  of  this  study  focused  upon  423  tasks  performed  by 
deputies  working  in  counties  with  less  than  40,000  population.  These 
officers  perform  a  myriad  of  county  law  enforcement  and  public  service 
tasks. 

The  third  study  analyzed  355  tasks  performed  by  47  field  sergeants 
working  in  police  departments  serving  highly  populated  cities  (3).  These 
officers  supervise  the  work  of  uniformed  patrolmen  who  provide  law  enforce- 
ment and  public  assistance  services  to  municipal  government. 

Thp  table  on  page  8    reports  a  matrix  of  correlates  between  factor 
vectors'across  three'occupatlons.  The  notation  "PERP  X  TDT"  in  item  1 
below  refers  to  two  factor  vectors  of  interest.  The  notation  r^,  r^,  and  r^ 

refers  to  correlates  in  each  occupational  field  relative  to  the  factor 
vector  of  Interest.  A  review  of  the  findings  Indicates  that: 

1.  PERP  X  TDT:   (r^  =  -.45,  r^  =  -.50,  &        =  -.35) 

A  majority  of  the  officers  in  each  occupation  perform  tasks  where 
low  time  delays  are  tolerated  prior  to  initiating  performance  of 
a  task  once  an  officer  observes  that  a  task  has  to  be  performed. 
This  implies  that  officers  may  not  have  time  to  seek  assistance  or 
guidance  from  supervisors  or  fellow  officers  on  how  to  perform  a 
task,  nor  be  able  to  look  up  a  procedure  in  a  manual. 

2.  PERP  X  TLD:  (r^  =  -.46,  r^  =  .35,  &       ^  --17) 

A  majority  of  the  officers  in  occupations  A  and  C  perform  tasks 
wh^ch  are  relatively  easy  to  learn  to  perform  as  indicated  by  the 
negative  coefficients.  This  is  not  the  case  with  deputy  sheriffs 
working  in  less  populated  counties.  Here,  a  positive  coefficient 
implies  the  tasks  performed  by  a  majority  of  the  officers  are 
difficult  to  learn  to  perform;  a  reverse  of  the  situation  normally 
expected  of  workers  in  an  entry  level  position.  It  is  generally 
understood  that  these  deputies  perform  a  wide  range  of  tasks  which 
in  larger  counties  would  be  performed  by  senior  deputies  or 
specialists. 
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3.  PERP  X  CPF:  (r^  =  -^^A,       =  .48,  &       =  -.02*) 

A  majority  of  the  officers  In  occupations  A  and  B  perfonn  tasks 
In  which  the  consequences  of  performance  failure  was  deemed  very 
severe.  This  Is  evidently  not  true  for  officers  In  occupation  C. 
The  job  descriptions  for  officers  In  this  field,  revealed  that 
field  sergeants  continue  to  perform  many  line  tasks.  Line  tasks 
being  the  type  of  work  normally  performed  by  uniformed  patrolmen. 

4.  TDT  X  TLD:  (r^  =  -.06*,  r^  =  -.34,  &       =  --20) 

It  appears  for  occupations  B  and  C  that  a  significant  negative 
correlation  exists  between  the  length  of  time  required  to  learn 
to  perform  a  task  and  the  delay  time  tolerated  to  Initiate  perfor- 
mance at  the  job  site.  This  was  not  a  characteristic  of  the 
relationship  between  tasks  performed  by  county  detention  officers 
as  evidenced  by  the  low  coefficient  r  =  -.06. 

5.  TDT  X  CPT:  (r^  =  -.77,  r^  =  -.76,  &       =  -.59) 

For  these  occupations  a  high  correlation  exists  between  the  delay 
time  tolerated  prior  to  Initiating  performance  of  a  task  and  the 
resultant  severity  If  performance  failure  occurs.  This  Implies 
that  tasks  with  low  time  delay  tolerances  will  produce  severe 
consequences  if  not  performed  correctly. 

6.  TLD  X  CPF:  (r^  =  .39,  r^  =  .71,  &       =  .71) 

For  these  occupations  a  high  correlation  exists  between  the  time 
required  to  learn  to  perform  a  task  satisfactorily  and  severity 
of  consequences  if  performance  failure  occurs.  Specifically,  this 
indicates  that  tasks  requiring  long  periods  of  learning  time  will, 
if  not  performed  correctly,  produce  severe  consequences. 

*Coefflclents  were  not  deemed  significant  at  .05  level. 


VI  CONCLUSIONS 


The  correlates  report  very  promounced  relationships  between  work 
requirements  in  each  occupation,  but  Indicate  that  uniform  relationships 
do  not  exist  across  these  occupations.  It  could  have  been  assumed  that 
all  law  enforcement  and  detention  related  occupations  in  criminal  justice 
career  fields  would  exhibit  similar  relationships  between  work  require- 
ments across  all  occupations. 

According  to  the  first  correlational  model  outlined  in  Section  III 
it  would  be  appropriate  for  sheriff's  deputies  to  receive  formal  school 
training  prior  to  job  assignment  since  the'work  requirements  of  performed 
tasks  meet  all  six  critical  criteria.  It  is  possible  that  formal  school 
training  could  be  delayed  for  new  employees  in  county  detention  centers 
since  a  majority  of  the  officers  perform  tasks  which  are  not  difficult 
to  learn  to  perform.  And,  it  is  conceivable  that  supervisory  training 
can  be  delayed  for  newly  appointed  first-line  supervisors  since  a  majority 
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of  supervisors  continue  to  perform  line  tasks  which  relative  to  all  casks 
in  their  job  domain,  art*  easy  to  learn  to  perform.  Also,  a  majority  of 
the  supervisors  perform  tasks  where  consequences  of  performance  failure 
ratings  (from  insignificant  to  serious)  appears  randomly  distributed 
across  all  tasks. 


TABLE  I 

Correlates  Between  Task  Factors  Across 
Three  Criminal  Justice  Occupations 

OCCUPATIONS 

A  =  258  County  detention  officers,  295  tasks 
B  =  121  Deputy  sheriffs,  423  tasks 

C  =  47    Field  sergeants  (municipal  police  department),  355  tasks 

TASK  FACTORS  (DATA  VECTORS) 

PERP  =  Percentage  of  members  performing  tasks 

TDT  =  Task  delay  tolerance 

TLD  =  Task  learning  difficulty 

CPF  =  Consequences  of  performance  failure 


TDT 

TLD 

CPF 

A 

B 

C 

A 

B 

C 

A 

B 

C 

PERP  -.45 
TDT  1 
TLD 

-.50 
1 

-.35 

1 

-.46 
-.06 
1 

.35 
-.34 
1 

-.17 
-.20 
1 

.24 
-.77 
.39 

.48 
-.76 
.71 

-.02 
-.59 
.71 

VII  SUMMARY 

Designers  of  instructional  systems  need  to  determine  if  formal  school 
training  should  be  scheduled  prior  to  job  assignment.  In  the  event  formal 
school  training  can  be  delayed  less  costly  forms  of  training  can  often  be 
instituted  at  the  job  site.  This  training  can  provide  job  experiences  and 
instruction  which  will  benefit  the  employee  during  his  formal  school  exper- 
ience.. The  job  experience  will  provide  a  frame  of  reference  to  make  formal 
school  training  more  job  related,  and  instruction  at  the  job  site  and  agency 
classroom  can  build  knowledge  and  skills  which  may  permit  reduction  in 
amount  of  time  required  to  deliver  formal  school  training. 

Present  procedures  for  designing  instructional  systems  incorporate 
techniques  for  collecting  data  to  validate  the  job  relatedness  of  proposed 
training  curriculum  and  can  define  critical  tasks  which  new  employees  should 
be  trained  to  perform.  This  same  data,  when  analyzed  in  a  correlation  matrix, 
can  offer  a  designer  of  instructional  systems  insights  into  the  critical 
work  requirements  of  tasks  distributed  across  a  specific  job  domain. 


293 


Determining  If  formal  school  training  can  be  delayed  requires  a  special 
analysis  of  the  critical  work  requirements  at  the  job  site.  The  analysis 
Involves  computing  a  matrix  of  correlates  among  task-factor  vectors  measur- 
ing: a)  emphasis  of  task  performance  at  the  job  site,  b)  task  learning 
difficulty,  c)  task  delay  tolerance,  and  d)  consequences  of  performance 
failure  of  tasks.  The  resultant  matrix  will  enable  the  designer  to  assess 
relationships  between  work  requirements  and  determine  if  formal  school 
training  can  be  delayed. 

Length  of  delay  is  a  judgment  the  designer  will  have  to  make  based  on 
knowledge  of  when  tasks  which  are  difficult  to  learn  to  perform  and  have 
low  time  delay  tolerances  become  major  assignments  for  new  employees.  An 
advisory  committee  composed  of  knowledgeable  first  line  supervisors  can 
assist  the  designer  in  setting  time  limits  which  can  vary  according  to  the 
work  environments  at  various  job  site. 
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The  Air  Force  Human  Resources  Laboratory  Is  actively  Involved 
In  performing  training  requirements  research  using  both  operational 
and  experimental  occupational  survey  data.    Currently,  research  is 
aimed  at  providing  products  to  assist  training  designers  in  deciding 
which  tasks  should  be  considered  for  training  in  various  Air  Force 
specialties.    In  addition  to  the  training  requirements  work,  a  basic 
research  study  is  presently  being  conducted  on  the  feasibility  of 
developing  a  method  of  prioritizing  job  tasks  in  terms  of  hazard  potential, 
expected  frequency  of  accidents  and  other  pertinent  factors  that  could 
assist  training  designers  in  determining  needs  for  safety  training. 

There  are  several  similarities  between  the  objectives  of  the 
safety  training  research  and  the  training  requirements  research.  Both 
streams  of  research  endeavor  to  define  certain  task  factors  that  will 
prove  to  be  predictive  of  training  requirements.    Both  efforts  employ 
the  regression  modeling  approach,  a  method  that  is  more  thoroughly 
discussed  by  Ruck  (1978).    Also,  both  projects  share  the  goal  of  con- 
tributing meaningfully  to  the  job  relevancy  of  Air  Force  training 
programs . 

The  safety  training  research  described  in  this  paper  is  in  response 
to  a  request  from  the  Air  Force  Inspection  and  Safety  Center  (AFISC) 
at  Norton  Air  Force  Base.    The  objective  is  to  provide  the  AFISC  with 
information  to  help  prevent  on-the-job  accidents  that  result  in  injuries, 
loss  of  equipment,  loss  of  time  and  loss  of  materials.    The  approach 
is  to  collect  hazard  potential  ratings  for  technical  tasks  and  determine 
the  extent  to  which  these  hazard  potential  ratings  (and  a  number  of 
other  task  factor  ratings)  can  predict  accidents  on  ^he  job.  The 
purpose  of  this  paper  is  to  present  an  approach  to  working  with  accident 
data,  to  discuss  some  of  the  problems  associated  with  this  type  of 
data  and  to  discuss  future  directions  for  an  expanded  study  of  accident 
data  in  various  career  fields.    It  is  important  to  note  that  the  problem 
addressed  in  this  paper  has  to  do  with  "what  tasks  will  have  accidents 
and  not  with  "which  people  will  have  accidents."    Therefore,  the  question 
aSdrSf-ed  in  this  paper'is  somewhat  different  from  that  normally  considered 
in  safety  research. 
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Approach 


The  approach  taken  in  the  safety  training  priority  research  was 
to  define  task  factors  believed  or  known  to  be  predictive  of  accidents, 
to  collect  task  factor  ratings  from  experts,  to  prioritize  job  tasks 
in  terms  of  nee^  for  safety  training,  and  to  develop  regression  models 
with  predictive  efficiency  for  safety  training.    Data  were  analyzed 
using  the  comprehensive  occupational  data  analysis  programs  (CODAP) 
(Christal  &  Weissmuller,  1976). 

Several  alternatives  to  determining  characteristics  peculiar 
to  safety  training  were  considered.    Based  on  previous  training  priorities 
research,  ratings  of  consequences  of  inadequate  performance,  task 
delay  tolerance  and  task  difficulty  were  included  in  the  analysis. 
Consideration  was  given  to  a  scale  that  would  yield  ratings  of  safety 
training  requirements,  but  was  rejected  because  the  scale  was  not 
clearly  related  to  the  problem.    Another  possible  approach  was  to 
use  only  tasks  which  had  been  involved  with  accidents;  however,  this 
approach  did  not  address  tasks  that  might  have  been  potentially  hazardous 
but  had  not  had  any  occurrence  of  accidents.    Ultimately  a  new  task 
factor  scale  was  devised  to  measure  the  hazard  potential  of  tasks. 
^The  approach  for  this  initial  study  is  promising  in  that  rater  response 
has  been  good  and  initial  results  are  encouraging. 

Data  Collection 

The  aircraft  armament  specialty  (AFSC  462X0,  previously  called 
weapons  mechanic),  was  chosen  for  the  present  study.    The  aircraft 
armament  career  ladder  consists  of  12,669  incumbents,  2,588  of  which 
serve  at  a  supervisory  skill-level.    The  major  job  groups  for  non- 
supervisory  incumbents  are  weapons  loader  (72%),  weapons  release  (18%), 
and  gun  services  (10%).    Each  job  incumbent  performs  an  average  of 
7  0  tasks  out  of  a  possible  527  tasks  included  in  the  job  inventory. 
Twenty-nine  percent  of  the  time  spent  by  job  incumbents  is  on  super- 
vistory  functions;  28%  of  their  time  is  spent  on  loading  functions; 
and  15%  is  spent  on  flight  line  inspections  and  operational  checks. 

Criterion  data  were  extracted  from  accident  reports  that  were 
supplied  by  the  AFISC.    Among  various  variables,  the  reports  provided 
the  accident  location  and  date,  the  cost  per  accident,  and  a  narrative 
describing  the  accident.    These  reports  were  reviewed  by  a  person 
knowledgeable  in  the  aircraft  armament  specialty  and  the  accidents 
were  matched  with  the  tasks  that  were  being  performed  when  the  accidents 
occurred.    The  number  of  accidents  per  task  was  then  established  for 
each  of  527  tasks  as  listed  in  the  job  inventory  for  462X0.    As  is 
frequently  discovered  when  dealing  with  accident  data,  the  ratio  of 
accidents  to  tasks  was  very  low.    In  a  time  frame  beginning  in  July, 
1975,  and  ending  in  December,  1976,  a  total  of  only  49  accidents  was 
reported  that  could  be  related  to  the  job  inventory  for  the  aircraft 
armament  specialty.    Furthermore,  these  49  accidents  were  associated 
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with  only  20  tasks  from  the  527  tasks  Included  In  the  Inventory.  When 
the  number  of  accidents  was  broken  down  by  duty,  it  was  found  that 
almost  half  (26)  of  the  accidents  occurred  while  performing  loading 
tasks.    Other  accident  related  duties  were:     (a)  performing  operational 
checks  of  aircraft  suspension,  release,  launch,  and  monitor  and  control 
systems  (10  accidents);  (b)  shipping  and  transporting  munitions  (8 
accidents);  (c)  performing  flight  line  inspections  of  aircraft  suspension, 
release,  launch,  and  monitor  and  control  components  (2  accidents); 
(d)  performing  flight  line  maintenance  of  gun  systems  (2  accidents); 
and  (e)  removing,  installing,  and  replacing  aircraft  suspension,  release 
launch,  and  monitor  and  control  components  (1  accident) . 

Several  task  factors  were  collected  from  the  supervisors  in  the 
field.    These  factors  included  consequences  of  inadequate  performance, 
task  delay  tolerance,  and  task  difficulty.    The  development  of  these 
factors  was  described  by  Mead  (1975)  at  the  17th  annual  conference 
of  the  Military  Testing  Association. 

Since  this  study  was  concerned  with  safety  training,  a  fourth 

^a1      Tho  hav.nxd  potential 

factor  was  suggested  by  a  study  which  evaluated  human  effects  on  nuclear 
systems  safety  (Askren,  Campbell,  Seifert,  Hall,  Johnson,  Sulzen, 
1976).    The  hazard  potential  scale  was  designed  to  determine  which 
tasks  are  more  hazardous  to  perform  than  others  and  might,  therefore, 
cause  accidents.    If  the  raters  agree  that  certain  tasks  are  more 
hazardous  to  perform  than  others,  then  safety  training  can  be  recommended 
for  those  tasks. 

The  nine  point  hazard  potential  scale  ranges  from  extremely  low 
hazard  potential  through  extremely  high  hazard  potential.    The  hazard 
scale  was  sent  to  seven  and  nine  skill  level  supervisors  who  were 
asked  to  first  check  only  those  tasks  in  the  inventory  which  he  or 
she  considered  to  be  potentially  hazardous.    Then  the  rater  was  asked 
to  rate  the  checked  tasks  on  a  scale  from  1-9  to  indicate  how  potentially 
hazardous  each  task  is.    For  analysis  purposes  the  scale  was  considered 
a  10  point  scale  because  a  task  not  checked  was  given  a  value  of  zero, 
indicating  no  hazard  potential.    Appendix  A  illustrates  the  rating 
scale.    Appendix  B  presents  the  inter-rater  agreement  Indices  for 
a  sample  size  of  50  (based  on  the  Spearman  Brown  formula)  for  all 
task  factors. 

In  addition  to  the  four  task  factors,  two  other  variables  whicl. 
had  previously  been  collected  in  a  routine  occupational  survey  were 
considered:    percent  members  with  1-48  months  total  active  military 
service  performing  each  task  and  an  index  of  percent  of  time  spent 
by  members  with  1-48  months  service  performing  each  task.  Appendix 
C  shows  the  zero-order  correlations  among  the  six  variables.  Although 
the  correlation  between  hazard  potential  and  consequences  of  inadequate 
performance  is  high,  (r=.70),  there  are  some  ^°""Pj"^\f  ""^""^^  , 
in  the  two  factors.    Other  factors  correlate  significantly  with  hazard 


but  the  correlations  are  not  as  high.    These  include  percent  members 
performing  (i:«.33),  percent  time  spent  (r«.35),  and  task  delay  tolerance 
(r«»-34).    The  negative  relationship  with  delay  is  due  to  the  fact 
that  the  delay  scale  is  inverted  with  a  rating  of  one  rather  than 
nine  being  the  most  critical.    The  lowest  and  only  nonsignif icnat 
correlation  was  hazard  potential  with  difficulty  (r=.-04).  Clearly, 
difficulty  and  hazard  potential  do  not  appear  to  be  linearly  related 
for  this  specialty. 

Data  Analysis 

A  factor  printout  program  (FACPRT)  was  run  to  produce  all  of 
the  tasks  sorted  in  descending  order  of  hazard  potential  according 
to  the  supervisory  ratings.    The  task  that  the  supervisors  agreed 
was  the  most  hazardous  was  "arm  or  dearm  aircraft  armament  systems 
other  than  guns".    An  extract  from  the  hazard  potential  FACPRT  listing 
is  given  in  Appendix  D. 

The  factor  printout  listing  reflects  the  opinions  of  the  people 
working  in  the  field  and  would  be  highly  useful  for  training  designers. 
However,  it  must  be  noted  that  some  of  the  ratings  may  have  been  affected 
by  the  rater's  knowledge  of  accidents  that  had  already  occurred  on 
Certain  tasks . 

To  take  the  research  a  step  further,  prediction  models  were  con- 
sidered.   As  mentioned  earlier,  a  major  difference  exists  between 
this  study  and  other  safety  research  in  that  this  study  is  predicting 
tasks  that  will  have  accidents  occur  while  the  task  is  being  performed 
rather  than  predicting  who  will  have  an  accident  while  performing 
the  tasks.     Consideration  was  given  to  predicting  the  probability 
of  an  accident  occurring  if  the  task  is  performed  once.    In  order 
to  predict  probabilities  it  would  be  necessary  to  have  frequency  of 
performance  data  for  each  task;  these  data  are  not  available.  Since 
a  considerable  data  collection  effort  would  be  required  to  obtain 
these  data,  the  model  predicting  probability  has  been  deferred  at 
this  time. 

From  the  possible  criteria  available  for  analysis,  frequency 
of  occurrence  of  accidents  per  task  was  considered  the  most  appropriate. 
The  distribution  of  the  criterion  for  this  specialty  was  badly  skewed. 
Of  the  20  tasks  associated  with  accidents,  11  tasks  had  only  1  accident 
occurrence,  3  tasks  had  2  accidents,  2  tasks  had  3  accidents,  1  task 
had  4  accidents,  2  tasks  had  6  accidents,  and  1  task  had  10  accidents. 

Three  models  were  developed  to  investigate  the  relationships 
among  three  primary  predictors,  hazard  potential,  an  index  of  percent 
time  spent,  1-48  months,  percent  members  performing  1-48  months;  six 
generated  variables;  and  the  frequency  criterion.    The  models  will 
be  referred  to  as  full,  exposure,  and  hazard.    Table  1  illustrates 
the  variables  in  each  model.    In  addition,  the  relative  contribution 
of  the  hazard  potential  rating  was  evaluated. 


299 


TABLE  1.    VARIABLES  INCLUDED  IN  EACH 
OF  THE  THREE  PREDICTION  MODELS 


Variables  Full  Exposure  Hazard 


Hazard  Potential 

X 

Percent  Members  Performing 
1-48  mos. 

X 

Y 
A 

Percent  Time  Spent  1-48  mos. 

X 

X 

Hazard  Squared 

X 

Members  Squared 

X 

X 

Time  Squared 

X 

X 

Hazard  X  Time 

X 

Hazard  X  Time  Squared 

X 

Hazard  Squared  X  Time  Squared 

X 

X 


X 


Results 


The  full  model  predicting  frequency  of  occurrence  had  an  R-.70 
(p<.001);  the  exposure  model  with  the  percent  time  and  percent  members 
variables  had  an  R=.68  (p<.001);  the  third  model  with  hazard  and  hazard 
squared  had  an  R=.38  (p<.001).    Considering  the  three  primary  predictors, 
hazard  potential,  the  index  of  percent  time  spent,  and  percent  members 
performing-  the  index  of  percent  time  spent  on  a  task  accounted  for 
the  most  variance  in  the  regression  models.    Percent  time  correlates 
,42  with  frequency,  whereas  hazard  potential  only  correlates  .27. 
However,  hazard  potential  does  contribute  significantly  to  the  full 
model. 

A  predicted  number  of  accidents  based  on  the  regression  weights 
derived  from  each  of  the  three  models  (full,  exposure,  hazard)  with 
frequency  of  accidents  as  the  criterion  has  been  computed  for  each 
of  the  527  tasks  for  each  model.    Each  of  the  three  sets  of  predicted 
numbers  of  accidents  was  ordered  In  factor  printouts  from  the  task 
with  the  highest  predicted  number  through  the  task  with  the  lowest 
predicted  number.    Appendix  F,  G,  and  H  are  tables  showing  the  cumulative 
percentage  of  accidents  occurring  at  different  cumulative  percentages 
of  tasks. 
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A  chi  square  was  run  on  each  of  the  sets  of  predicted  number 
of  accidents  to  test  the  hypothesis  that  the  distribution  of  actual 
accidents  over  predicted  scores  was  no  better  than  chance.    The  accident 
distribution  was  found  to  be  significantly  different  from  chance  (p<.01) 
for  each  set  of  predictors.    A  chi  square  for  independence  among  the 
three  sets  of  predicted  scores  was  significant  (p<.05).    In  addition, 
a  chi  square  for  independence  between  the  full  model  and  the  exposure 
model  was  significant  (p<.05).    However,  no  difference  between  the 
full  model  and  hazard  was  found.    Appendix  I  presents  the  chi  square 
models.    Although  the  regression  model  indicates  that  the  hazard  potential 
ratings  add  significantly  to  the  prediction,  the  chi  square  analysis, 
a  somewhat  less  powerful  test,  does  not  indicate  significantly  different 
distributions  between  the  full  model  and  the  hazard  model.    A  decision 
to  use  or  not  use  the  hazard  potential  ratings  would  be  based  on  further 
test  results  together  with  the  expense  in  money  and  time  involved 
in  collecting  the  data. 


Conclusions  and  Future  Directions 

The  results  of  the  analysis  applied  to  the  aircraft  armament 
speciality  are  encouraging.    One  of  the  most  useful  products  generated 
is  the  factor  printout  of  the  hazardous  tasks  as  rated  by  the  supervisors 
in  the  career  ladder.    This  is  an  easily  understandable  tool  which 
could  be  used  by  the  training  designers.    The  full  regression  model 
that  was  developed  to  predict  expected  frequency  of  accidents  accounts 
for  49%  of  the  variance  in  this  particular  speciality.    However,  it 
is  not  yet  known  how  well  the  model  will  hold  up  on  cross  validation. 

Efforts  are  continuing  to  determine  if  the  methods  so  far  developed 
in  the  present  study  are  valid  and  generalizable.    To  that  end,  research 
is  currently  in  progress  to  cross  validate  and  cross  apply  results 
developed  in  the  present  study. 

Additional  survey  of  the  462X0  ladder  has  been  conducted  and 
is  under  analysis.    The  survey  was  performed  to  collect  field  recommended 
training  emphasis  judgments.    Field  recommended  emphasis  ratings  are 

a  measure  of  a  task's  recommended  formal  training  emphasis,  either 

school  or  on-the-job,  based  upon  the  ratings  of  field  supervisors. 

The  interrelationship  of  this  variable  with  others  already  collected 

will  be  investigated. 

A  cross  validation  study  is  planned  for  the  462X0  ladder.  When 
enough  new  accident  data  (18  months  worth)  have  been  collected,  the 
weights  from  the  frequency  of  performance  model  will  then  be  applied 
to  the  new  data  to  determine  how  well  the  equation  would  predict  in 
the  cross-application. 
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The  method  used  for  analysis  of  the  aircraft  armament  specialty 
will  be  repeated  for  two  additional  career  fields.    Surveys  are  curren 
In  the  field  for  Fire  Protection  (571X0)  and  Fuel  (631X0).  Results 
from  these  two  fields  may  Indicate  whether  the  methods  developed  have 
any  applicability  across  specialties. 

In  general,  the  preliminary  findings  from  this  feasibility  study 
have  been  encouraging.    The  approach  and  the  methods  for  predicting 
tasks  which  will  have  accidents  are  promising.    However,  results  from 
this  Initial  study  must  be  regarded  with  reservations  until  a  cross- 
validation  of  462X0  Is  finished  and  the  results  of  cross-applications 
to  additional  career  fields  are  available. 
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APPENDIX  A:    HAZARD  POTENTIAL  RATING  SCALE 


Rating  Scale  Hazard  Potential 

1  Extremely  Low  Hazard  Potential 

2  Very  Low 

3  Low 

4  Below  Average 

5  Average 

6  Above  Average 

7  High 

8  Very  High 

9  Extremely  High  Hazard  Potential 
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APPENDIX  B:     RATER  AGREEMENT  INDICES  AND  AVERAGE 
MEAN  RATINGS  FOR  TASK  FACTORS 


Task  Factor 

^kk* 

Average  Mean  Rating 

Hazard  Potential 

.9315 

1.87 

Consequences  of  Inadequate  Performance 

.9390 

6.16 

Task  Delay  Tolerance 

.8914 

4.52 

Task  Difficulty 

.9302 

4.07 

*    Rater  agreement  Indices  for  a  sample  size  of  50  raters  as 
estimated  by  the  Spearman  Brown  formula. 
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APPENDIX  C:  CORRELATIONS  OF  VARIABLES  (N=527  TASKS)* 


Hazard 
Potential 

Hazard  Potential  1.0000 

Consequences  of 
Inadequate  Ferformnce  .6992 

Task  Delay  Tolerance  -.3394 

Task  Difficulty  -.0439 

Percent  Members 
Performing  H8  mos.  .3302 

Percent  Time  Spent  .3509 
H8  mos. 


Correlations  above  .088  are  significant  at  the  .025  level. 


Consequences  Task 

of  Inadequate  Delay  Task  Percent  Members  Percent  Time 
Performance    Tolerance  Difficulty  Performing  Spent 


1.0000 

-.5958  1.0000 

.2711  -.1438  1.0000 

.2870  -.4453     -.2526  1.0000 

.2377         .4431     -.2820         .9702  1.0000 
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APPENDIX  D:  PRIORITIZED  JOB  TASKS  IN  TERMS  OF  HAZARD  POTENTIAL  RATINGS 


I 

Con 

I 

Time 

of 

Num 

naz 

Mem 

Spent 

Inad, 

Del 

* 

of 

Pot 

H8 

Per 

M 

Dif 

Acc 

1? 
J! 

Am  or  Dearm  Aircrait  Armament  Systems 

1 

o.Z 

62 

1.9 

7.5 

1.9 

3.8 

6 

Other  Than  Guns 

V 

I 

i7n 

i/U 

Loaa  or  unioac  Non~Muciear  Mitlons  on 

J. 9 

57 

l.D 

7.8 

2,5 

4.2 

10 

Aircraft  or  Pre-Load  Stands  or  Racks 

F 

172 

Load  or  Unload  Preloaded  Non-Nuclear 

3 

5.9 

34 

.8 

7.8 

2.6 

4.2 

0 

Munitions  on  Aircraft 

F     174  Perform  Functional  Checks  or  Tests  on  23     4.3     60      1.7     7.7    2.10  4.21  6 
Aircraft  Armament  Circuits  While 
Loadine 

P     426  Drive  Ammunition  Loaders  76     3.1     18       ,4     5.4    4,7     3,1  0 

H     230  Perform  Operational  Checks  of  Jettison  84     2.9     48       .9     7.3    3.6     4.2  1 
or  Emergency  Release  Systems  Using 
Meters  or  Indicators 

P     445  Perform  Facility  Cleaning  on  Vehicles  396      .8  7 


SD=  1.24   11.16  .27 


.9 

5.9 

3.5 

4.3 

.19 

6.16 

4.52 

4.07 

.27 

.86 

.81 

.55 

VD 
0 
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APPENDIX  E:    CORRELATIONS  BETWEEN  PREDICTORS 
AND  CRITERION  (N=527  TASKS) 

Accident  Frequency 
^HMfA  Pebentlal  .2721 
%  Members  Performing  (1-48  mos)  .3386 
%  Time  Spent  (1-48  mos)  .4215 

All  correlations  significant  (p<.025) 


APPENDIX  F:     CLASSIFICATION  OF  PERCENTAGES  OF  ACCIDENTS  OCCURRING  ON 
DIFFERENT  PERCENTAGES  OF  TASKS  ORDERED  ON  PREDICTED  NUMBER 
OF  ACCIDENTS  BASED  ON  FULL  REGRESSION  MODEL 


Percentages  of  Accidents      Percentages  of  Tasks 

45  1 

53  5 

67  10 

86  20 

86  30 

90  40 

98  50 


100  100 
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APPENDIX  G:     CLASSIFICATION  OF  PERCENTAGES  OF  ACCIDENTS  OCCURRING  ON 
DIFFERENT  PERCENTAGES  OF  TASKS  ORDERED  ON  PREDICTED  NUMBER 
OF  ACCIDENTS  BASED  ON  EXPOSURE  MODEL 


Percentages  of  Accidents      Percentages  of  Tasks 

45  1 

49  5 

49  10 

67  20 

69  30 

82  40 

92  50 


100  100 
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APPENDIX  H:     CLASSIFICATION  OF  PERCENTAGES  OF  ACCIDENTS  OCCURRING  ON 
DIFFERENT  PERCENTAGES  OF  TASKS  ORDERED  ON  PREDICTED  NUMBER 
OF  ACCIDENTS  BASED  ON  HAZARD  MODEL 


Percentages  of  Accidents     Percentages  of  Tasks 

33  1 

49  5 

65  10 

88  20 

88  30 

88  ^0 

96  50 


IOC  100 
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APPENDIX  I:    CHI  SQUARES 


Number  of 
Accidents 


Chi  Square  for  Difference  From  Chance 
for  Full  Model 


20 


Percentage  of  Tasks 
40  60  80 


42 


=  133.14  (p<.01) 


100 


Number  of 
Accidents 


Chi  Square  for  Difference  From  Chance 
for  Exposure  Model 


20 


Percentage  of  Tasks 
40  60  80 


33 


x2  =  70.49  (p<.01) 


100 


3.     Chi  Square  for  Difference  from  Chance 
for  Hazard  Model 

Percentage  of  Tasks 

20  40  60  80  100 

Number  of  43  0  5  1  0 

Accidents 

x2  =  142.33  (p<.01) 
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Chi  Square  for  Independence  Among 
Three  Models 

Percentage  of  Tasks 


0-1 

2-5 

6-10 

11-20 

21-100 

Full 

22 

4 

7 

9 

7 

Exposure 

2 

n 

9 

16 

Hazard 

16 

8 

8 

11 

6 

M-l  CO 
O  4J 
6 


3  O 

2!  <: 


[2  =  19.30  (p<.05) 


^  „   Full  Model 

O  4J 

c 

^  Exposure 

-9  Model 
§  o 
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5.    Chi  Square  for  Independence 
Between  Full  and  Exposure  Models 


0-1 


22 
22 


Percentage  of  Tasks 
2-5  6-10  11-20 


4 
2 


x"^  =  11.19  (p<.05) 


7 
0 


21-100 


7 
16 


6.  Chi  Square  for  Independence 
Between  Full  and  Hazard  Models 


0-1 


Full  Model 


tM  CO 

a  ,§  Hazard 
z  <: 


Model 


22 
16 


Percentage  of  Tasks 
2-5  6-10  11-20 


4 
8 


y?-  =  2.62  (NS) 


7 
8 


9 
11 


21-100 


7 
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The  opinions  and  corrclusions  expressed  in  this  paper 
are  those  of  the  authors  and  are  not  necessarily 
those  of  the  United  States  Air  Force, 


Since  the  U.S.  Air  Force  (AF)  developed  its  first  major  instructional 
system  in  1965,  the  systems  approach  to  training  has  received  considerable 
emphasis  within  the  Department  of  Defense  and  in  the  civilian  sector. 
The  issuance  of  AF  Manual  (AFM)  50-2,  Instructional  System  Design,  and 
AF  Pamphlet  (AFP)  50-58,  Handbook  for  Designers  of  Instructional  Systems, 
witnessed  a  realization  on  the  part  of  the  AF  that  application  of 
modern  instructional  technologies  might  yield  substantial  improvements 
in  the  effectiveness  and  efficiency  of  AF  training  programs.    In  both 
documents,  considerable  emphasis  has  been  placed  upon  achieving  close 
correspondence  between  training  program  content  and  job  performance 
requirements. 

The  Occupational  Survey  (OS)  is  an  information  source  useful  for 
accomplishing  job  analysis  and  specifying  job  performance  requirements 
within  the  context  of  AF  technical  training.    However,  it  does  not,  nor 
was  it  intended  to,  generate  the  kinds  of  data  about  job  performance 
subtasks  or  elements  and  supporting  skills  and  knowledges  that  are 
required  to  design  instruction.    These  data  are  the  products  of  a  rig- 
orous task  analysis.    The  process  by  which  a  skilled  instructional 
designer  identifies  the  major  procedural  steps  and  makes  inferences 
about  skill  and  knowledge  requirements  is  not  well  articulated.  Addi- 
tionally, those  in  the  Air  Training  Command  (ATC)  who  are  responsible 
for  conducting  and  documenting  task  analyses  are  Subject-Matter  Special- 
ists (SMSs),  not  educational  technologists.    The  implementation  of  a 
simplified  task  analysis  procedure/documentation  system  and  a  computer- 
based  task  analysis  data  bank  may  offer  significant  economies  in  the 
design  and  revision  of  technical  training  courses.    A  standardized  task 
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analysis  procedure  would  help  insure  that  course  content  decisions  are 
.made  on  the  basis  of  job  performance  requirements  as  moderated  by  train- 
ing situation  constraints;  and  a  computer-based  data  bank  would  provide 
a  means  of  storing,  retrieving,  updating,  and  disseminating  detailed 
task  analysis  information.    Ultimately,  these  economies  might  be  expected 
to  manifest  themselves  in  the  form  of  more  effective  and  less  costly 
training. 

The  primary  objective  of  this  study  is  to  develop  and  field  test  a 
simple-to-use,  reliable,  valid,  and  cost-effective/time-efficient  task 
analysis  procedure  for  application  by  ATC  training  development  personnel 
responsible  for  the  design  and  conduct  of  technical  training  courses.  A 
secondary  objective  is  to  make  recommendations  regarding  the  feasibility 
and  utility  of  implementing  a  computer-based  task  analysis  data  bank  and 
to  submit  a  preliminary  data  bank  design  for  consideration.    End  items 
include: 

a.  A  handbook  detailing  a  standard  task  analysis  procedure  that 
will  provide  an  acceptable  degree  of  uniformity  and  quality 
control  over  task  analysis  efforts  at  ATC  Technical  Training 
Centers  (TTCs);  and 

b.  A  systems  analysis  of  present  and  future  AF  task  analysis 
requirements,  with  special  emphasis  on  the  recommendations 
regarding  future  plans  for  a  task  analysis  data  bank. 

The  investigative  approach  employed  in  this  study  is  straight- 
forward and  comprehensive.    In  Phase  I,  task  analysis  procedures  currently 
in  use  at  ATC  TTCs  were  characterized  and  evaluated,  and  recommendations 
for  improving  the  task  analysis  effort  were  proposed.    In  Phase  II,  a 
standard  procedure  was  specified  and  a  prototype  handbook  was  developed. 
It  will  be  field  tested  at  ATC  TTCs,  and  revised  on  the  basis  of  field 
test  results.    In  Phase  III,  the  task  analysis  handbook  and  the  descrip- 
tion of  data  bank  requirements  will  be  prepared,  reviewed  in  conference 
with  intended  users  and  management  personnel,  and  revised  as  necessary 
prior  to  final ization.    Inherent  in  this  approach  is  the  assumption  that 
continuous  involvement  of  ATC  training  development  and  management  person- 
nel in  the  design,  testing,  ^nd  revision  process  will  insure  that  the 
final  product  is  useful  and  will  maximize  the  probability  that  it  will  be 
accepted  and  implemented. 

Phase  I 

Survey  Procedures 

A  semi  structured  research  interview  was  employed  to  gain  insight  into 
the  task  analysis  procedures  currently  being  utilized  in  the  AF  technical 
training  community.    Specific  areas  of  inquiry  were  the  relative  percentage 
of  time  spent  revising  existing  courses  versus  developing  new  ones;  proce- 
dures currently  utilized  to  accomplish,  document,  and  validate  subtask  and 
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skin /know! edge  analyses;  and  familiarity  with  and  judged  adequacy  of 
the  task  analysis  guidance  provided  in  AFM  50-2  and  AFP  50-58. 

The  sample  of  interviewees  included  a  full  range  of  training 
development  personnel,  including  military  and  civilian  education 
specialists.    Instructional  System  Development  CISD)  technicians,  and 
master  instructors,  who  had  been  or  were  currently  involved  with  task 
analysis  efforts  at  the  five  ATC  Training  Centers.    In  addition,  train- 
ing development  personnel  from  the  3306th  Test  and  Evaluation  Squadron 
at  Edwards  AFB;  the  School  of  Aerospace  Medicine,  Brooks  AFB;  and  the 
School  of  Health  Care  Sciences,  Sheppard  AFB,  were  also  interviewed* 

Results 

We  found  that  task  analysis  procedures  and  documentation  methods 
utilized  at  the  TTCs  were  widely  variant.    Documentation  produced  in 
response  to  inquiries  regarding  how  the  results  of  task  analyses  were 
recorded  ranged  from  Plans  of  Instruction  (POIs)  to  fairly  detailed  ISD 
worksheets,  most  of  which  were  locally  designed.    It  was  our  feeling 
that  quality  control  of  the  task  analysis  effort  across  branches  within 
the  same  group  would  have  been  difficult,  at  best.    An  integrated 
quality  control  program  across  TTCs  would  be  virtually  impossible. 
Therefore,  an  attempt  to  develop  and  implement  a  standardized  task 
analysis  procedure/documentation  system  for  application  at  all  TTCs 
seemed  a  worthwhile  pursuit. 

We  also  noted  with  some  interest  that  no  individual  or  group  of 
individuals  at  the  TTCs  was  ultimately  accountable  for  the  task  analysis 
effort.    The  issue  of  accountability  is,  of  course,  closely  related  to 
that  of  quality  control.    For  ATC  to  realize  the  maximum  benefits 
associated  with  implementing  a  standardized  task  analysis  procedure/ 
documentation  system  and  to  insure  a  rigorous  quality  control  program, 
an  articulated  accountability  system  must  be  defined  and  implemented. 

Frequently  heard  comments  regarding  currently  available  ISD  and 
task  analysis  guidance,  documents  (AFM  50-2  and  AFP  50-58)  included, 
"too  complex,"  "require  too  much  paperwork,"  and  "most  applicable  to' 
the  design  of  new  courses."    With  regard  to  the  final  comment,  in 
response  to  a  direct  survey  inquiry,  we  found  that  training  development 
personnel  currently  spend  the  great  majority  of  their  time  (in  excess 
of  95%)  completing  task  analyses  in  support  of  the  redesign  of  existing 
technical  training  courses.    There  seems  to  be  a  legitimate  need  for  a 
simplified  task  analysis  procedure/documentation  system  that  can  be 
applied  in  the  revision  of  existing  courses  as  well  as  the  development 
of  new  courses.    Additionally,  a  preliminary  assessment  of  the  feasibil- 
ity of  implementing  an  automated  storage/ retrieval  system  for  task 
analysis  data  seems  warranted.    This  type  of  data  bank  would  facilitate 
the  revision  of  existing  courses  and  would  support  implementation  of  a 
quality  control /accountabi 1 ity  system. 
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Recommendations/Actions 

Based  on  survey  findings  and  our  observations,  we  recommended  that 
a  simplified  task  analysis  procedure/ documentation  system,  including 
improved  procedures  for  in-process  review  of  task  analysis  efforts,  be 
developed  and  field  tested  at  ATC  TTCs.    Additionally,  we  recommended 
investigating  the  feasibility  of  providing  an  automated  storage/retrieval 
system  for  task  analysis  data.    The  Air  Force  Human  Resources  Laboratory 
(AFHRL)  and  ATC  directed  us  to  proceed  with  the  development  of  a  proto- 
type task  analysis  handbook.    Further,  ATC  agreed  to  support  field 
testing  of  the  prototype  handbook  at  the  TTCs. 

Phase  II 

Handbook  Development 

The  task  analysis  handbook  addresses  the  design  and  revision  of 
technical  training  courses  and  presumes  the  existence  of  a  comprehensive 
task  listing  in  the  form  of  a  Specialty  Training  Standard  (STS)  or 
Course  Training  Standard  (CTS).    The  handbook  task  analysis  procedure 
represents  a  best-mix  of  procedures  contained  in  existing  documents  and 
literature,  while  incorporating  the  comments  and  suggestions  made  by 
training  development  personnel  during  the  Phase  I  survey. 

The  handbook  presents  task  analysis  as  a  three  stage  process. 
Figure  1  presents,  in  flowchart  format,  an  outline  of  the  handbook  task 
analysis  procedure. 

Stage  A  consists  of  converting  STS/CTS  task  and  knowledge  items 
into  Preliminary  Criterion  Objectives  (PCO).    In  Stage  B,  each  PCO  is 
examined  and  broken  dov/n  into  its  component  subtasks.    Finally,  Stage  C 
consists  of  determining  the  skills  and  knowle-  ^es  which  underlie  or 
support  each  subtask.    It  was  our  strong  feeling  that  identification  of 
supporting  skills  and  knowledges  had  to  be  addressed  if  the  task  analysis 
effort  was  to  achieve  its  two  primary  objectives:    (1)  insuring  that 
only  "need  to  know"  content  was  included  in  a  course,  and  (2)  providing 
an  adequate  information  data  base  to  support  preparation  of  objectives 
and  test  items.    Stage  C  is  primarily  inferential  in  nature  and  therefore 
less  amenable  to  procedural ization  than  other  parts  of  the  process. 
However,  some  guideline's  are  offered  in  support  of  Stage  C  activities. 

It  should  be  noted  that  there  are  a  number  of  differences  between 
the  handbook  procedures  and  the  detailed  task  analysis  guidance  pre- 
sented in  AFP  50-58.    First,  the  task  analysis  guidance  in  AFP  50-58 
is  fragmented,  whereas  the  prototype  handbook  presents  task  analysis  as 
an  integrated  process.    Second,  AFP  50-58  calls  for  a  considerable 
amount  of  task  analysis  activity  prior  to  final ization  of  the  training 
standard,  while  the  handbook  assumes  a  training  standard  as  the  point 
of  departure.    Third,  the  handbook  specifies  a  single  format  (the 
flowchart)  for  intermediate  documentation,  and  a  single  form  for  final 


317 


mm 

WW 
(nil  ItKULlSl 


X 


IClltV 

I' 

Ml 


 ) 


Utui 


OMIPttS 


imnr 
•t 


MitU 


lilt 


documentation.    Importantly,  too,  the  final  documentation  form  is  con- 
siderably simpler  than  the  one  presented  in  AFP  50-58.    Fourth,  the 
handbook  is  built  on  the  assumption  that  task  analysis  will  be  performed 
by  a  Subject-Matter  Specialist  (SMS)  who  is  relatively  inexperienced  in 
ISD  theory  and  practice.    Handbook  procedures,  therefore,  do  not  require 
the  SMS  to  conduct  an  instructional  analysis.    The  procedures    n  AFP 
50-58,  on  the  other  hand,  call  for  the  analyst  to  classify  eacn  knowl- 
edge being  analyzed  (e.g.,  chaining,  associating)  to  determine  profi- 
ciency levels  for  supporting  skills  and  knowledges,  and  to  specify  the 
amount  of  practice  required  to  reach  proficiency.    It  is  our  feeling 
that  these  decisions  are  better  left  to  instructional  design  specialists. 
Fifth,  and  finally,  the  handbook  specifically  calis  for  a  series  of 
reviews  by  SMSs  at  key  points  in  the  task  anlysis  process.    The  inter- 
action between  analysts  and  reviewers  should  provide  an  excellent 
safeguard  against  overtraining.    It  was  our  feeling  that  SMS  review  and- 
verification  is  given  insufficient  emphasis  in  AFP  50-58. 


Table  1 

Differences  Between  AFP  50-58 
and  Task  Analysis  Handbook  Procedures 


AFP  50-58 

Fragmented  Procedures 

Analysis  Prior  to  Final ization 
of  Training  Standard 

Complex  Documentation 

Assumes  Instructional  Design 
Expertise 

Requires  Managerial  Review 


Task  Analysis  Handbook 

Integrated  Process 

Assumes  Existence  of 
Training  Standard 

Simple  Documentation 

Assumes  Technical  Subject 
Matter  Expertise 

Requires  More  Interaction 
and  Review  by  Other  SMSs 
during  Task  Analysis 


Field  Test  Procedures 

A  two-stage  field  test  of  the  prototype  task  analysis  handbook 
will  be  conducted.    Stage  1  consists  of  preliminary  tryouts,  while 
Stage  2  will  be  devoted  to  feasibility  testing  (i.e.,  formal  evaluation). 

Stage  1  Procedures  and  Results.    Preliminary  tryouts  were  accom- 
plished to  obtain  information  useful  for  revising  the  prototype  handbook. 
The  goal  was  to  develop  an  empirical  data  base  that  could  be  used  to 


identify  required  revisions  and  make  the  handbook  as  useful  as  possible. 
A  potentially  important  by-product  of  preliminary  tryouts  was  a  s-et  of 
task  analysis  examples  directly  relevant  to  AF  technical  training.  The 
three  sites  selected  for  preliminary  tryout  of  the  handbook  were  Keesler 
AFB,  Edwards  AFB,  and  Chanute  AFB. 

Every  attempt  was  made  to  insure  that  the  courses  chosen  for 
preliminary  tryouts  of  the  task  analysis  procedure  encompass  the  full 
range  of  technical  training.    Basic  and  advanced  training  for  "soft" 
skill  courses,  operator  training  courses,  and  maintenance  courses  were 
represented.    Two  o**  more  courses  per  jite  were  utilized  as  test  beds. 
For  each  course,  a  duty  area  was  selected,  and  a  task  performance  item 
and  a  task  knowledge  item  from  within  that  duty  area  were  chosen  for 
analysis.    For  each  course  at  each  site,  two  SMSs  participated  in  the 
preliminary  tryout.    One  of  these  SMSs  served  as  an  analyst,  the  other 
served  as  a  reviewer. 

Analysts  then  employed  the  handbook  procedures  to  analyze  one  task 
performance  item  and  one;  task  knowledge  item  from  the  selected  duty 
area  and  documented  the  analyses.    Task  analysts  vere  encouraged  to  ask 
questions,  identify  problems,  and  present  suggestions  for  improving  the 
procedure.    If  the  analyst  failed  to  understand  an  explanation,  another 
wording  or  elaboration  was  attempted.    If  the  analyst  failed  to  under- 
stand an  example,  verbal  clarification  was  provided.    Proble;ns  encountered, 
explanations  and  additional  information  provided,  and  suggestions  for 
improvement,  as  well  as  typographical  errors  and  other  kinds  of  diffi- 
culties that  the  analysts  encountered,  were  recorded.    Reviewers  had 
two  tasks  during  preliminary  tryouts.    Their  primary  task,  of  course, 
consisted  of  reviewing  task  analysis  worksheets  and  documentation.  A 
secondary  function  involved  critically  reviewing  the  handbook  in  an 
attempt  to  identify  faulty  wording,  unclear  passages,  inadequate 
explanations,  poor  examples,  improper  sequencing,  poor  layout,  typo- 
graphical errors,  and  other  difficulties.    Additionally,  general 
suggestions  for  improving  the  handbook  and  procedures  described  therein 
were  solicited. 

Additionally,  each  Technical  Training  Group  (TTG)  at  each  TTC 
provided  a  senior  review  team,  consisting  of  an  educational  specialist 
and  a  senior  SMS,  which  examined  the  handbook,  identified  problems  and 
made  suggestions  for  improvement,  and  completed  a  free-response  question- 
naire containing  items  related  to  the  adequacy  and  practicality  of  the 
task  analysis  procedure/documentation  system  described  in  the  handbook, 
as  well  as  items  related  to  appropriateness  of  style  and  format. 

To  reiterate,  the  objective  of  the  preliminary  tryout  phase  of 
field  testing  was  to  gather  information  that  could  be  utilized  to  "fine 
tune"  the  handbook  prior  to  feasibility  testing  Ci-e.,  formal  evaluation). 
At  each  test  site,  those  training  development  personnel  who  participated 
as  task  analysis  teams  and  as  senior  review  teams  generated  a  sizable 
number  of  suggestions  for  improving  the  handbook.    There  was  at  each 


test  site  substantial  overlap  between  the  suggestions  put  forth  by  the 
two  groups  of  participants.    In  our  view,  this  close  agreement  consti- 
tuted consensual  validation  and  provided  sufficient  justification  for 
revising  the  handbook  in  accord  with  the  suggestions  made.    Not  surpris- 
ingly, both  the  number  of  new  and  the  number  of  major  suggestions 
generated  decreased  steadily  from  site  to  site.    We  concluded  that  the 
preliminary  tryouts  had  indeed  served  their  primary  purpose— a  consider- 
able amount  of  ''fine  tuning'*  had  been  accomplished. 

Stage  2  Procedures.    The  three  sites  to  be  used  for  feasibility 
testing  (formal  evaluation)  will  be  Lackland  AFB,  Lowry  AFB,  and 
Sheooard  AFR.     Thrpp  rnur^r^^  nor  c-?to  wtII  Ko  iAanti-fi^M  KpHc 

For  each  course,  a  duty  area  will  be  selected,  and  a  task  performance 
item  and  a  task  knowledge  item  from  that  duty  area  chosen  for  analysis. 

For  each  course  at  each  test  site,  four  SMSs,  one  senior  SMS,  and 
one  training  specialist  will  participate  in  the  feasibility  testing. 
The  pool  of  four  SMSs  will  be  divided  into  two  two-person  task  analysis 
teams.    On  each  team,  one  SMS  will  serve  as  the  analyst,  the  other  as 
reviewer.    The  senior  SMS  and  the  training  specialist  will  serve  as  a 
task  analysis  evaluation  team. 

Analysts  will  utilize  the  task  analysis  handbook  to  analyze  the 
selected  task  performance  item  and  task  knowledge  item  from  the  chosen 
duty  area  and  document  the  analyses.    Those  participants  designated  as 
reviewers  will  participate  in  the  analysis  and  documentation  activities 
in  the  manner  prescribed  in  the  handbook.    The  amount  of  time  required 
by  each  team  to  complete  each  major  step  in  the  analysis  will  be 
recorded.    Upon  completing  the  analysis,  each  analyst  and  reviewer  will 
be  asked  to  complete  a  Handbook  Evaluation  Survey.    The  survey  consists 
of  43  Likert-type  items  that  solicit  opinions  regarding  the  task  analysis 
procedures  prescribed  in  the  handbook  as  well  a    handbook  format  and 
style.    Additionally,  three  free-response  items  u. a  also  included  to 
allow  respondents  to  indicate  which  handbook  features  they  like  best 
and  least  and  to  raise  important  issues  not  directly  addressed  in  the 
survey. 

Evaluators  will  then  be  asked  to  review  the  completed  task  and 
knowledge  analysis  and  assess  each  analysis  from  the  standpoints  of 
accuracy,  completeness,  and  overall  adequacy  as  a  basis  for  the  develop- 
ment of  objectives,  the  preparation  of  tests,  and  the  design  of  instruc- 
tion.   They  will  also  be  asked  to  judge  the  degree  of  correspondence 
between  the  analyses  produced  by  the  two  analysis  teams. 

Additionally,  each  TTG  at  each  TTC  will  provide  a  senior  review 
team,  consisting  of  an  educational  specialist  and  a  senior  SMS,  which 
will  examine  the  handbook,  identify  problems,  and  make  suggestions  for 
improvement,  and  complete  the  Handbook  Evaluation  Survey. 
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The  simplicity  of  the  handbook  procedure  will  be  assessed  by 
examining  analyst,  reviewer,  and  senior  review  team  opinions  regarding 
the  readability  of  the  manual,  the  clarity  of  the  explanations  offered, 
the  adequacy  of  examples  included,  and  the  appropriateness  of  the 
terminology.    These  data  will  be  gathered  with  the  Handbook  Evaluation 
Survey. 

The  validity  of  the  handbook  procedures  will  be  assessed  by  examin- 
ing the  opinions  of  the  task  analysis  evaluation  teams  with  regard  to: 
the  accuracy  of  each  analysis;  the  completeness  of  each  analysis;  and 
the  uveran  adequacy  of  each  analysis  as  a  basis  for  developing  objec- 
tives, preparing  tests,  and  designing  instruction.    An  overall  rating  of 
the  quality  of  each  analysis  will  also  be  solicited. 

The  reliability  of  the  handbook  procedures  will  be  assessed  by 
examining  the  correspondence  between  analyses  for  each  course  (evalu- 
ation team  judgments),  and  the  consistency  of  high  correspondence  across 
courses.    The  consistency  with  which  the  new  procuedres  produce  high 
quality  results  will  provide  an  additional  index  of  reliability. 

Summary  and  Conclusions 

An  investigative  study  was  undertaken  at  the  behest  of  ATC  to 
review  task  analysis  methodologies  currently  in  use,  to  recommend 
improvements  in  current  procedures,  to  develop  a  simple-to-use,  reliable, 
valid,  and  cost-effective/time-efficient  task  analysis  procedure.  In 
addition,  should  a  successful  procedure  be  developed,  the  study  would 
examine  the  feasibility  of  providing  an  automated  storage  and  retrieval 
system  for  task  analysis  data. 

Results  from  Phase  I  of  the  study  included  strong  recommendations 
for  development  of  a  simple  standardized  procedure  and  documentation 
system,  and  establishment  of  accountability  for  task  analyses.  Further- 
more, it  was  recommended  that  the  procedure  be  oriented  toward  both 
course  revision  and  initial  course  development. 

A  new  task  analysis  procedure  was  developed  to  satisfy  the  ATC 
requirements.    The  procedure  differed  significantly  from  current  AF 
recommended  task  analysis  procedures  in  that  it  is  simpler,  designed  for 
SMSs,  requires  streamlined  documentation,  requires  accountability,  and 
is  an  integrated  process.    Preliminary  tryouts  of  the  prototype  task 
analysis  procedures  resulted  in  a  handbook  that  could  be  formally 
evaluated.    Conclusions  about  the  success  of  the  handbook  must  wait 
until  the  final  formal  testing  has  been  conducted  and  evaluated. 
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INTRODUCTION 

The  U.S.  Army  Field  Artillery  School  at  Ft.  Sill,  Oklahoma  is 
charged  with  the  responsibility  of  training  artillery  officers  in  all 
facets  of  artillery  systems  performance.    One  component  of  this  system 
is  the  location  of  enemy  targets  and  subsequent  destruction  of  these 
targets  through  direction  of  fire  by  an  observer  located  in  a  forward 
position  in  the  combat  zone,  remote  from  the  artillery  pieces.  The 
accuracy  and  rapidity  with  which  the  forward  observer  (FO)  is  able  to 
perform  these  tasks  have  a  direct  bearing  on  the  outcome  of  the  battle- 
field situation,  i.e.,  whether  enemy  targets  are  destroyed  or  disabled. 
With  advances  in  battlefield  weapons  technology  and  enemy  mobility,  the 
role  of  the  FO  has  become  even  more  critical.    Recently,  concern  has 
been  expressed  regarding  the  selection  of  personnel  who  are  best  suited 
to  perform  these  tasks  and  the  requisite  training  necessary  to  increase 
the  efficiency  and  effectiveness  of  the  combat  artillery  unit. 

In  response  to  this  concern,  a  Weapons  System  Training  Effectiveness 
Analysis  (WSTEA)  study  was  conducted  by  the  Directorate!  df  Evaluation  at 
the  Army  Field  Artillery  School.    That  study  focused  on  the  forward 
observer  component  of  the  Field  Artillery  system.    Their  findings  indi- 
cated that  considerable  improvement  in  the  effectiveness  of  the  system 
could  be  achieved  by  improving  the  accuracy  of  both  target  acquisition 
and  location  on  the  part  of  the  FO. 

It  is  clear  .from  the  WSTEA  report  that  FO  performance  is  not  at  the 
desired  level.    The  WSTEA  evaluation  revealed  that  although  accurate 
fire  delivery  could  be  achieved,  forward  observers  required  an  average 
of  4.7  artillery  rounds  in  adjustment  to  achieve  the  desired  accuracy. 
The  Army  Training  and  Evaluation  Program  (ARTEP)  standard  is  three  rounds 
for  adjustment  prior  to  firing  for  effect.    Other  results  of  the  WSTEA 
field  evaluation  showed  self-location  accuracy  and  target  location  accu- 
racy to  be  below  ARTEP  standards. 


*This  is  based  upon  research  being  conducted  for  the  U.S.  Army  Research 
Institute  for  the  Behavioral  and  Social  Sciences  under  Contract 
DAHC-19-78-C-0025. 
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Additional  studies  (Eschenbrenner  &  Taylor,  1969;  Taylor  & 
Eschenbrenner,  1970;  Taylor,  Eschenbrenner,  &  Valverde,  1970;  Dominque, 
1973;  Laveson  &  De  Vries,  1973;  U.S.  Army  Combat  Development  Comnand, 
1968;  U.S.  Army  Field  Artillery  School,  1975;  and  Thomas,  1976)  suggest 
the  same  conclusion  reached  by  the  WSTEA  team.    FOs  are  not  performing 
at  acceptable  levels  overall  and  in  some  cases,  performance  is  so  far 
below  acceptable  standards  that  it  would  severely  impair  combat  effec- 
tiveness.   In  order  to  upgrade  the  performance  level  of  the  Field 
Artillery  FO,  and  thereby  improve  the  combat  effectiveness  of  the  field 
artillery  subsystem,  increased  emphasis  must  be  placed  on  the  selection 
and  training  of  FOs  who  can  demonstrate  competence  on  combat-referenced 
operational  performance  measures.    This  can  be  achieved  by  analyzing  the 
forward  observer  tasks,  developing  a  profile  of  the  effective  forward 
observer,  and  specifying  the  correspondence  between  this  profile  and 
valid  performance  criteria. 

The  following  paper  presents  the  McDonnell  Douglas  Astronautics 
Company  -  St.  Louis  (MDAC-St.  Louis)  approach  to  the  development  of  a 
methodology  for  the  selection  and  training  of  field  artillery  FOs. 

TECHNICAL  APPROACH 

The  MDAC-St.  Louis  approach  to  the  selection  and  training  of  FOs. 
incorporates  a  job  analysis  of  current  FO  job  and  skill  requirements 
with  a  training  analysis  of  the  FO  component  of  the  Field  Artillery 
Officer  Basic  Course  (FAOBC).    In  the  FO  Job  Analysis,  two  techniques, 
task  analysis  and  profile  development,  have  been  combined  in  order  to 
maximize  the  amount  of  information  available  for  the  decision  process  in 
the  training  analysis  phase.    The  task  analysis  element  will  identify  the 
essential  skills  and  knowledges  an  FO  needs  to  know  in  his  combat 
role.    The  profile  development  will  supplement  the  task  analysis  with  an 
examination  of  the  critical  characteristics,  abilities,  aptitudes,  person- 
alities, education,  and  personal  histories  of  the  successful  FOs. 
Neither  of  these  techniques,  task  analysis  nor  profile  development,  is 
particularly  innovative  in  its  usual  context,  especially  since  task 
analysis,  in  the  classical  usage,  does  involve  some  elements  of  trainee 
characteristic  description.    However,  the  combination  of  task  analysis 
with  the  type  of  profile  development  procedure  that  is  typically  the 
domain  of  personnel  selection  will  provide  the  basic  standards  for  FO 
selection,  as  well  as  the  information  critical  to  the  determination  of 
FAOBC  program  effectiveness.    Additionally,  it  will  furnish  the  data 
necessary  to  suggest  improvements  to  be  incorporated  into  FO  training 
and  to  upgrade  and  standardize  that  program. 

Job  Analysis 

The  primary  objective  of  the  FO  job  analysis  is  the  identification 
of  the  critical  tasks  an  FO  must  perform  in  order  to  achieve  his  mission. 
These  essential  job  elements  will  be  compared  with  the  existing  FO  train- 
ing program  to  determine  if  all  critical  tasks  are  being  taught.  TRADOC 
Pamphlet  350-30, Interservice  Procedures  for  Instructional  Systems  Develop- 
ment; Phase  I:    Analyze. outlines  four  basic  procedures  to  be  used  in  the 
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conduct  of  a  job  analysis:    1)  development  of  a  tentative  task  1^-t, 
2)  authentication  of  the  task  list,    3)  validation  of  the  task  list,  and 
4)  identification  of  subtasks,  conditions,  cues  and  standards.  Since 
the  present- research  >i  not  specifically  directed  toward  the  development 
of  detailed  behavioral  objectives  and  instructional  materials,  but  to 
the  identification  of  critical  skills,  our  activity  is  directed  to  the 
task  level  of  specificity  rather  than  to  the  subtasks,  conditions  and 
standards  level . 

The  initial  task  listing  was  developed  by  extracting  FO  and  possible 
FO  tasks  from  pertinent  OBC  texts  and  from  direct  observation  of  FO 
training  activities.    Special  emphasis  was  placed  on  Gunnery,  Map  Reading, 
and  Counterfire  texts  and  on  graded  and  ungraded  firing  exercises.  Once 
the  tentative  lists  were  developed,  they  were  consolidated  into  a  list  of 
candidate  FO  tasks, and  a  preliminary  task  categorization  scheme  was 
developed. 

The  list  of  candidate  tasks  was  reviewed  with  FAOBC  instructors  from 
the  Gunnery,  Counterfire  and  Tactics  departments  at  the  Field  Artillery 
School.    At  least  ten  instructors  from  each  department  were  interviewed 
for  task  selection  purposes.    Because  the  refinement  of  a  task  listing 
is  an  iterative  process,  the  list  which  reflects  the  inputs  of  the  FAOBC 
instructors  is  not  considered  a  final  task  listing,  but  will  be  subject 
to  further  review. 

The  revised  FO  task  list  will  be  reviewed  by  additional  FO  instruc- 
tors and  FOs    assigned  to  organic  Field  Artillery  Units.  Structured 
interviews  with  fifty  FOs  are  scheduled.    The  interviewees  will  be  asked 
to  evaluate  each  task  for  offensive  and  defensive  scenarios  in  the  follow- 
ing theaters:    European  theater.  Far  Eastern  theater,  Middle  East  and 
African  theaters.    Interview  data  will  be  augmented  by  information 
collected  via  questionnaires  distributed  to  FOs  who  have  served  in  Europe, 
Korea,  Vietnam  and  CONUS.    The  questionnaires  will  also  include  items 
pertinent  to  training  and  profile  development. 

Profile  Development 

Profile  development  will  emerge  from  analytical  and  statistical 
examinations  of  a  critical  skills  and  characteristics  list  for  the  effec- 
tive FO  and  from  an  assessment  of  the  makeup  of  the  current  FAOBC  student 
population.    The  list  of  critical  skills  and  characteristics  is  being 
developed  primarily  from  the  following  three  sources  and  procedures: 
1)  Examination  of  the  prioritized  FO  task  list,  2)  interviews  with 
experienced  FOs  and  FAOBC  instructors,  and  3)  questionnaire  responses 
from  experienced  FOs. 

The  examination  of  the  prioritized  FO  task  list  presumes  that  certain 
tasks  demand  specif ic, requisite  skills  and  characteristics.  Similarly, 
to  operate  specialized  FO  related  equipment  necessarily  demands  certain 
abilities  which  must  be  components  of  the  critical  skills  list  and, 
observing  logical  sequence,  components  of  the  profile.    List  elements 


emerging  from  this  process  wtII  be  further  evaluated  when  interview  and 
questionnaire  data  sets  are  complete. 

Interviews  will  be  used  not  only  for  further  refinement  of  the  crit- 
ical skills  and  characteristics  listing,  but  for  the  generation  of  new 
elements  for  inclusion  in  the  critical  skills  and  characteristics  lists. 
Artillery  Officers  assigned  at  Ft.  Sill,  Oklahoma, and  officers  assigned 
at  other  CONUS  installations  will  be  interviewed.    The  interviews  with 
these  officers  will  serve  to  provide  a  more  diverse  sample. 


Characteristics  and  critical  skills  identified  from  the  above 
activities  are  being  subjected  to  further  evaluation  using 
questionnaires.    Additional  elements  of  a  skills  and  characteristics 
listing  will  be  directly  solicited  using  the  same  questionnaire. 
Descriptive  statistics  will  be  compiled  for  the  questionnaire  responses 
and  used  for  further  refining  of  the  profile. 

A  second  major  component  of  the  profile  development  activity  relates 
to  the  development  and  refinement  of  the  FO  Personal  Profile  Questionnaire 
and  the  provisional  validation  of  the  profile  developed  with  that  instru- 
ment.   A  developmental  version  of  that  questionnaire  has  been  administered 
to  FAOBC  12-78.    Item  analysis  on  this  version  will  be  completed  with 
comparisons  of  upper  and  lower  criterion  group  performance  along  several 
criterion  dimensions.    Those  include  firing  scores  for  individual  graded 
shoots,  a  combined  firing  score,  gunnery,  counterfire,  and  tactics  grades, 
and  the  overall  OBC  grade.    The  criterion  measures  will  not  be  available 
for  a  few  weeks,  but  some  early  frequency  data  from  selected  question- 
naire items  are  included  in  the  preliminary  results  section.    The  train- 
ing and  intermediate  criteria  will  allow  the  research  team  to  select  those 
items  with  the  greatest  potential  for  discriminating  between  high  and  low  ab 
lity  student  FOs.     Additionally,  certain  items  provide  data  useful  for 
training  development  independent  of  the  criteria.    Information  gleaned 
from  the  analysis  of  the  first  developmental  form  will  be  used  to  refine 
the  questionnaire.    The  revised  form  will  be  administered  to  FAOBC  3-79. 
Analysis  of  responses  to  that  questionnaire  will  serve  to  further  improve 
the  profile  development  device.    Th    development  of  the  profile  will  also 
include  an  evaluation  of  characteristics  of  current  OBC  students  reflected 
in  personal  data  sheets.    Variables  identified  here  will  be  analyzed  in 
conjunction  with  factors  from  the  FO  Personal  Profile  Questionnaire  and 
the  critical  skills  and  characteristics  list.    A  preliminary  model  of 
the  effective  FO  will  emerge  from  this  analysis  activity. 

PRELIMINARY  FINDINGS 

As  an  example  of  how  the  various  steps  of  the  job  analysis  interact 
with  each  other  and  impact  the  training  analysis,  we  have  developed  a 
series  of  regression  equations  and  summary  statistics  for  selected 
samples  of  FAOBC  student  course  grades  and  personal  profile  questionnaire 
responses. 

Data  collected  from  students  of  FAOBC  6-78  were  examined  as  part  of 
a  preliminary  hypothesis  generating  activity.    More  extensive  data  sets 


for  three  separate  samples,  all  considerably  larger,  are  being  collected 
and  will  be  analyzed  to  evaluate  hypotheses  generated  in  this  activity.* 


The  predictor  variable  data  file  for  each  student  included  age; 
source  of  commission  (comprised  of  four  dummy  variables.  Army  ROTC,  Navy 
ROTC,  Army  OCS,  and  National  Guard  with  Marine  PLC  as  the  reference); 
marital  status;  college  major  (composed  of  the  dummy  variables,  science 
and  math,  business,  and  education,  with  liberal  arts  as  the  reference); 
and  scores  on  two  tests  administered  at  the  beginning  of  OBC,  the  Math- 
ematics subtest  of  the  Sequential  Tests  of  Educational  Progress  (STEP) 
and  the  nonverbal  subtest  of  the  Lorge-Thorndike  Intelligence  Test. 
Criterion  measures  available  for  this  early  analysis  included  firing 
accuracy  scores  from  two  graded  shoots;  ten  subcourse  test  scores;  and 
a  weighted  average  of  these  which  will,  for  convenience  and  clarity, 
be  referred  to  as  the  average  grade.    Three  regression  models  will  be 
presented  and  their  implications  discussed. 

The  first  model  was  constructed  using  average  grade  as  the  dependent 
variable  and  allowing  the  predictors  to  enter  (or  exit)  from  the  model 
according  to  a  stepwise  variable  selection  procedure.    The  descriptive 
linear  multiple  regression  model  achieved  is: 


^1^1 


Table  1  shows  B  values;  the  order  of  variable  selection;  the  value  of  the 
statistic,  F,  when  each  predictor  variable  was  entered;  and  changes  in 
r2  with  the  addition  of  variables.    The  value  of  R'^  for  the  model  is  .489. 
Although  not  great,  it  is  suggestive  in  light  of  the  modest  sample  size 
and  the  preliminary  nature  of  the  analysis. 

TABLE  1 

SUMMARY  OF  REGRESSION  MODEL  1  -  AVERAGE  GRADE 


Variable 
Descriptor 

STEP  Score 

Army  OCS 

Navy  ROTC 

Married 

Education  Major 
Business  Major 
Constant  (Bq) 


Variable 
^1 


B 

(In  Percent- 
age Points) 

.147 

-3.080 

5.561 

3.517 

5.184 

3.285 

41.542 


Increase 
in  R2 

Total 

R2 

F 

To  Enter 

.281 

.281 

17.60 

.043 

.324 

2.80 

.027 

.351 

1.79 

.045 

.397 

3.15 

.046 

.443 

3.39 

.046 

.489 

3.61 

*The  present  set  included  only  47  students  for  whom  an  entire  data  set 
was  available.    The  authors  are  fully  aware  of  the  limitations  imposed 
by  this  small  sample  size,  but  conclusions  are  intended  as  preliminary 
and  to  reflect  a  "data  snooping"  activity. 
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The  negative  effect  (b  value)  of  Army  OCS  when  evaluated  against  a 
reference  variable  of  Marine  Platoon  Leader  Course  (PLC)  graduates  indi- 
cates a  Marine/Army  difference.    This  difference  is  further  amplified  by 
the  Navy  ROTO  effect.    Virtually,  all  students  in  OBC  6-78  who  received 
their  commission  through  Navy  ROTC  were  Marines. 

It  is  not  especially  surprising  that  there  is  such  a  marked  differ- 
ence between  Marine  and  Army  OBC  graduates  since  the  Marines,  particularly 
the  Marine  PLCs,  receive  a  significantly  greater  amount  of  pre-OBC  train- 
ing in  map  reading,  land  navigation,  and  terrain  association.  These 
three  skill  areas  have  been  judged  to  be  critical  FO  tasks  by  the  FO 
instructors  in  the  task  identification  step  of  the  FO  task  analysis. 
Additionally,  the  OBC  course  of  instruction  presumes  prior  training 
in  map  reading,  land  navigation,  and  terrain  association  in  the  alloca- 
tion of  time-to-task  instruction.    However,  interviews  with  FO  instruc- 
tors reveal  this  assumption  to  be  false.    This  is  supported  by  the  afore- 
mentioned data.    If  the  trend  identified  by  this  regression  equation  is 
confirmed,  a  recommendation  in  the  training  analysis  phase  of  the  present 
research  effort  might  be  to  pretest  on  these  three  tasks  to  determine 
those  students  requiring  remedial  work. 

College  major  may  have  a  potentially  important  effect.    As  indicated 
by  the  rearession  model  1,  the  effect  of  college  major  accounted  for  over 
9  percent^of  the  variance.    Because  of  the  restricted  sample,  the  effect 
should  be  treated  cautiously.    Again,  if  this  trend  is  confirmed  in 
subsequent  samples,  more  definite  interpretation  could  be  developed.  Presumably, 
business  and  education  majors  may  be  more  involved  with  form  completion, 
routine  procedure  following,  etc.,  than  the  liberal  arts  or  science 
major, and  it  is  this  practice  that  may  account  for  the  difference. 

The  second  regression  model  examined  the  radial  missed  distance  of 
the  location  indicated  by  each  OBC  student  serving  as  the  FO  on  the 
mobile  shoot  firing  exercise  SW.    In  a  mobile  shoot,  students  function 
as  FOs  from  a  vehicle  which  is  moving  between  individual  firing  exercises 
and  is   sometimes  moving  during  the  actual  firing  exercise.    This  means 
that  the  student  must  locate  and  adjust  rounds  from  multiple  locations 
with  less  opportunity  for  carefully  determined  self  location  than  would  be 
the  case  with  a  stationary  firing  exercise.    The  descriptive  linear  mul- 
tiple regression  model  with  radial  missed  distance  as  the  dependent 
measure  is: 

Y  =      +  B^X^  +  B2X2  +  ^ 
Table  2  shows  information  important  for  interpreting  this  model. 

Several  important  features  of  this  regression  model  should  be  noted. 
First,  it  accounted  for  only  21  percent  of       variability  in  the  data. 
Second,  the  magnitude  of  the  Bs  is  large.     .hird,  only  variables  indi- 
cating source  of  commission  entered  the  model.    If  one  were  to  take  this 
model  seriously.it  would  imply  that  Army  OCS  graauates  and  Army  officers 
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Increase 

Total 

F 

6  (In  Meters) 

in  R2 

To  Enter 

416.3 

.180 

.180 

9.88 

117.5 

.021 

.201 

1.14 

TABLE  2 

SUMMARY  OF  REGRESSION  MODEL  2  -  SW  RADIAL  MISS  DISTANCE 

Variable 

Descriptor  Variable 
Army  OCS 
Army  ROTC 
Constant  (Bq)  153.0 

who  completed  ROTC  do  not  achieve  the  level  of  accuracy  in  target  location 
that  individuals  who  obtained  their  commission  from  other  sources  achieve. 
This  is  an  hypothesis  which  can  be  examined  in  future  data  sets. 

It  must  be  pointed  out  here  that  radial  Miss  Distance  should  logically 
be  the  closest  aooroximation  of  the  operational  criterion  available  in 
the  present  training  environment.    Additionally,  the  FO  of  the  future 
field  artillery  team  is  more  likely  to  be  involved  in  conducting  fire 
adjustment  from  a  mobile  position.    Recent  developments  include  the 
development  and  testing  of  a  Forward  Observer  vehicle.    As  such,  identi- 
fication of  predictors  of  this  criterion  would  be  potentially  more  valu- 
able than  identifying  predictors  of  certain  other  factors. 

The  third  descriptive  model  looked  at  the  linear  multiple  regression 
of  the  predictors  on  the  combined  observed  fire  grade  for  all  OBC  graded 
shoots  and  the  best  two  of  three  hasty  target  location  exercises  conducted 
by  the  Gunnery  Department  at  the  FAS  (G0211).    The  model  achieved  is: 

Y  =    Q  +  B-iB-i  +  B2X2  +  ...  +  BgXg  +  e  (3) 

Table  3  shows  pertinent  information  regarding  this  model  in  the  same 
format  as  previously  reported  regression  models. 

TABLE  3 

SUMMARY  OF  REGRESSION  MODEL  3  -  OBSERVED  FIRE  GRADE  GO-211 

B 

Variable  (In  Percent- 

Descriptor  Variable     tage  Points) 

Business  Major  X-j  9.231 

Navy  ROTC  Xg  n-144 

fducation  Major  X3  9.801 

Married  X^  4.737 

Army  OCS  Xg  -4.581 

Large  Score  Xg  -0.119 

Constant  (Bq)  91-928 


Increase 

R2 

Total 

R2 

F 

To  Enter 

.153 

.153 

8.15 

.116 

.270 

7.01 

.066 

.336 

4.28 

.057 

.393 

3.96 

.065 

.459 

4.99 

.036 

.494 

2.81 
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As  with  the  Average  Grade,  college  major  and  source  of  commission 
have  an  effect  on  this  grade.    (Of  course  this  grade  is  not  independent 
of  the  average  grade,  r  =  .686.)    What  this  suggests  is  that  the  effect 
of  major  and  source  of  commission  influence  observed  fire  grades  and  not 
just  total  course  grade. 

A  developmental  form  of  the  Forward  Observer  Personal  Profile 
Questionnaire  was  administered  to  192  FAOBC  12-78  students  at  the  begin- 
ning of  training.    Their  responses  on  the  questionnaire  items  will  even- 
tually be  compared  with  end-of-course  and  in-course  scores  to  determine 
an  "FO  profile."  Without  the  availability  of  test  scores  on  FAOBC  12-78 
students,  very  little  information  can  be  gleaned  from  this  administration 
of  the  questionnaire.    However,  the  responses  on  two  questions  are  of 
interest  in  this  discussion  of  preliminary  findings. 

The  first  question  asked  "What  was  your  first  branch  choice?" 
Possible  responses  included:    artillery,  infantry,  armor,  combat  engi- 
neer, finance,  adjutant  general  and  other  noncombat  branch.    FAOBC  12-78 
students  selected  as  their  first  choice:    41%artillery,  6%  infantry, 
8%  armor,  6%  combat  engineer,  3%  finance,  8%  adjutant  general  and  28% 
other  noncombat  branch.    If  the  categories  are  collapsed,  these  responses 
indicate  that  59%  chose  some  nonartillery  branch  of  the  Army  as  their  first 
choice.    Of  the  59%  that  chose  a  nonartillery  branch  as  their  first  choice, 
57%  chose  a  noncombat  branch  as  first.    Noncombat  branch  was  the  first 
choice  of  39%  of  the  total  sample.    These  data,  if  this  trend  is  continued 
in  later  samples,  suggest  a  possible  motivational  factor.    The  question 
then  arises,  should  only  students  who  want  to  be  in  the  field  artillery 
combat  arms  branch  of  the  Army  be  admitted?   At  this  time,  this  is  not 
a  viable  solution,    how  then,  in  the  course  of  instruction,  do  you  change 
this  attitude,  not  necessarily  from  wanting  to  be  in  the  field  artillery 
(albeit  desirable)  but  to  an  attitude  of  wanting  to  do  well  in  FAOBC? 

rhe  second  question  dealt  with  their  judgment  of  the  principle 
factor  involved  in  most  failures  to  hit  the  target.    Possible  responses 
included:    a  breakdown  in  communications,  inadequate  performance  by  the 
FO,  inadequate  equipment,  errors  on  the  part  of  the  gun  crew,  errors  in 
the  FDC,  and  gun  error  and  weather  factors.    Fifty-eight  percent  felt 
that  inadequate  performance  by  the  FO  accounted  for  most  failures  to 
hit  the  target.    Twenty  percent  thought  it  was  a  result  in  a  breakdown 
of  coiTiiiunicatlons;  11  percent  gun  error  and  weather  factors;  four  percent 
no  response;  four  percent  inadequate  equipment;  two  percent  errors  on 
the  part  of  the  gun  crew;  and  one  percent  errors  in  the  FDC.  These 
responses  were  given  before  the  students  had  received  any  FO  training.  If 
they  have  this  attitude  prior  to  traininq,  how  then  does  it  affect  their 
motivation  to  learn,  and^  secondly,  what  can  be  done  within  OBC  to 
change  this  attitude? 

The  motivational  issues  raised  by  these  two  questions  only  serve 
to  pinpoint  areas  requiring  further  analysis.    Only  if  a  relationship 
between  these  types  of  questions  and  the  dependent  scores  is  determined 
can  there  be  any  real,  substantive  discuijsion  of  alternatives. 
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INTRODUCTION 

The  ability  of  a  human  observer  to  locate  himself  on  the  earth's 
surface  in  relation  to  other  objects  or  targets  on  that  same  surface  has 
widespread  military  and  civilian  application;  the  importance  of  which  is 
easily  overlooked  due  to  the  assumption  of  the  skill's  uniform  existence 
among  individuals.    Self-location  or  spatial  orientation  ability  is  often 
implicitly  assumed  to  exist  at  levels  common  to  all  individuals  in  land 
and  sea  navigation  training  even  though  there  is  extensive  evidence  to  the 
contrary  (Witkin,  1946;  Woodring,  1939).    There  has  been  an  extensive 
research  effort  in  the  area  of  spatial  orientation  related  to  localized 
brain  damage  (Ratcliff,  Newcombe  1974;  Hecacn,  Tzortzis,  and  Masure  1974), 
sex  differences  (Cohen,  1977;  Maxwell,  Croake  and  Biddle,  1976;  Pellrgrlni 
and  Empey  1971),  age  differences  (Howard  and  Templeton,  1966),  and  race 
differences  (Osborne  and  Gregor,  1966),  but  relatively  little  research 
has  been  specific  to  self-location  or  geographical  spatial  orientation 
and  military  map  training  involving  target  acquisition  for  indirect  fire 
weapons.     The  purpose  of  the  exploratory  research  reported  here  is  to 
examine  self-location  abilities,  as  they  relate  to  cognitive  directional 
orientation,  by  developing  an  instrument  capable  of  identifying  those  who 
do  poorly  or  do  well  on  such  directional  tasks. 


The  views  expressed  in  this  paper  are  those  of  the  authors  and  do  not 
necessarily  reflect  the  views  of  the  Array  Research  Institute  or  the 
Department  of  the  Army. 

Sincere  appreciation  is  expressed  by  the  authors  to  Dr.  Donald  0.  Weitzman, 
US  Army  Research  Institute,  whose  work  in  this  area  generated  an  interest  and 
provided  a  framework  for  the  authors.    Appreciation  is  also  expressed  to 
MAJ  D.  Nemetz  and  SFC  E.  Johnson,  US  Army  Research  Institute,  Fort  Sill  Field 
Unit,  for  their  assistance  in  data  collection. 
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REVIEW  OF  LITERATURE 


The  importance  of  self-location  abilities  was  denonstrated  by  the 
Army's  Hunan  Engineering  Laboratories  in  a  field  test  of  the  field 
artillery  indirect  fire  system  in  the  early  1970 's  (Technical  Memorandum 
2^-70).    This  field  test  found  that  over  50%  of  the  error  variance  in  the 
indirect  fire  system  was  attributed  to  the  forward  observer's  inability 
to  locate  che  target  or  himself  in  relation  to  the  target  within  acceptable 
standards.    Array  Training  and  Evaluation  Program  (ARTEP)  standards  allow  a 
maximum  error  of  250  meters  in  target  location.    Field  tests  reveal  however, 
that  the  average  target  location  error  is  between  500  to  700  meters.  This 
field  test  although  well  designed  and  executed  encountered  difficulties  in 
controlling  nusiance  variables  whxch  may  have  influenced  the  reliability 
of  forward  observer  perforraavice  as  the  authors  noted  in  that  study.  The 
50%  error  variance  attributed  to  the  fon^ard  observer  nay  and  probably  does 
overestimate  the  error  variance.     There  appears,  however,  little  doubt 
either  empirically  or  logically,  that  the  accuracy  of  the  forward  observer 
largely  determines  the  accuracy  of  the  indirect  fire  weapons.     The  rifle 
marksman's  accuracy  is  affected  by  the  condition  of  his  rifle  and  the 
weather  conditions  but  most  importantly  is  determined  by  his  aim  or  per- 
ceptual judjnent.     With  indirect  fire  weapons,  however,  the  crew  doing 
the  firing  neither  see  the  target  nor  calculate  adjustments  due  to  weather, 
distance,  etc.     These  functional  tasks  are  broken  down  and  performed  by 
other  team  members  who  in  the  case  of  the  forward  observer  may  be  separated 
by  many  miles  from  the  actual  guns  being  fired.     The  forward  observer 
generally  is  the  only  member  of  the  indirect  fire  team  who  can  actually 
observe  the  target  being  fired  upon;  he  transmits  his  observations  to  the 
fire  direction  center  (FDC)  where  this  information  is  processed  by  calcu- 
lating weather  conditions,  gun  location,  type  of  munition  being  fired,  etc. 
These  calculations  are  then  sent  to  the  gun  crew  in  the  form  of  elevation 
and  deviations  which  will  be  set  on  the  guns  and  the  rounds  fired.  The 
forward  observer  observes  the  impact  of  the  rounds  fired  and  transmits 
corrections  to  the  fire  direction  center  who  in  turn  recalculate  and  send 
new  elevation  and  deviation  information  to  the  gun.     The  essential  dif- 
ference between  the  perceptual  judgment  (aiming)  used  by  the  rifle 
marksmanship  and  the  observing  done  by  a  forward  observer  is  in  the  area 
of  what  the  researchers  call  "conceptual  associating." 

The  rifle  marksman  once  he  has  established  the  range  of  his  target 
and  adjusted  the  sights  on  his  weapon  is  faced  primarily  with  a  perceptual 
alignment  task  in  that  he  must  be  concerned  vith  the  placement  of  the 
adjusted  and  aligned  sights  upon  the  target  for  accuracy.    The  forward 
observer  on  the  other  hand  is  faced  with  the  much  core  complex  task  of 
associating  a  target  he  can  see  on  a  horizonal  plane  to  a  military  map 
drawn  in  the  vertical  plane.     He  must  be  able  to  analyze  the  actual  terrain 
from  one  perspective  and  interpolate  what  that  terrain  looks  like  when 
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expressed  in  symbols  and  from  a  different  perspective.    Thus  it  is  pri- 
marily a  conceptual  task  requiring  extraction  and  association  of 
information  in  a  form  other  than  that  observed. 

Kozlowski  and  Bryant  (1977)  studied  geographical  spatial  orienta- 
tion ability  in  a  series  of  three  experiments  in  an  aCtenpt  to  further 
investigate  individual  differences  in  orientation  skills  reported  in  the 
research  literature.    The  first  experiment  divided  human  subjects  (N«45) 
into  categories  of  either  good  sense-of-direction  or  bad  sense-of-direc- 
tion.    The  subjects  were  then  tested  to  see  if  what  people  say  about 
their  sense  of  direction  relates  to  their  actual  directional  and  mapping 
abilities.    The  first  test  consisted  of  pointing  to  unseen  buildings,  a 
map-drawing  task,  and  a  pointing  to  north  and  nearby  cities  task.  The 
results  of  this  experiment  indicated  that  the  better  the  self-report  of 
sense-of-direction  the  better  was  the  orientation  performance.  Average 
point L:;^  error  was  19.3°  (SD='9.5)  and  33.2°  (SD=14.6)  for  gocd  and  poor 
sense  of  direction  subjects  respectively,  t(43)=3.41.  p<  .01. 

The  second  experimerit  in  this  research  was  a  refinement  of  the  first 
with  the  inclusion  o.t  additional  independent  variables.    Subjects  were 
given  directions,  distance,  and  time  estimation  tasks.    Results  indicated 
that  self-reports  of  sense-of-direction  and  self-reports  of  distance- 
estimation  ability  are  highly  correlated;  and  the  better  the  sense  of 
direction  or  distance,  the  smaller  the  pointing  error.    Tne  mean  pointing 
error  was. 10. 79    (SD=5.08)  for  good  sense-of-direction  people  and  25.71 
(SD»19.53)  for  poor  sense-of-direction  people.    The  failure  of  time  or 
distance-estimation  performance  to  correlate  with  anything  was  probably 
due  to  lack  of  variation  in  the  performance  data  according  to  the  authors. 

The  third  experiment  attempted  to  answer  the  question  "How  well  would 
self-reports  of  directional  ability  be  able  to  predict  spatial  performance 
in  a  novel  environment?"    A  human  size  maze  was  used  to  ans\*er  this 
question  in  the  form  of  a  section  of  tunnels  underneath  a  dormitory  complex. 
The  subjects  were  lead  through  the  maze  once  and  then  traveled  the  maze  as 
a  group  for  three  trials  in  which  performance  measures  were  observed  for 
time,  distance,  and  direction,  along  with  self-reports  of  the  same  per- 
formance variables  after  each  trial.    The  researchers  found  in  this  study 
that  people  with  good  and  poor  senses  of  direction  do  not  differ  in  their 
average  pointing  error,  in  the  accuracy  of  their  estimation  of  straight 
line  or  route  distance  to  the  end  of  the  tunnel ,  or  in  their  estimation  of 
time  spent  in  the  tunnels  (F  ratios  <  1).    Analysis  of  the  results  of  these 
three  experiments  led  the  researchers  to  conclude  that  far  from  having  an 
extreme  facility  at  orientation-one  that  requires  little  work;  the  good 
sense-of-direction  people  appeared  to  be  more  active  and  put  more  effort 
into  the  tasks. 
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The  group  method  of  traveling  through  the  maze  of  tunnels  may  have 
hidden  some  significant  differences  between  the  two  categories.    Those  with 
poor  sense-of-direction  may  have  simply  went  along  with  the  good  sense-of- 
direction  people.    This  possibility  was  acknowledged  in  the  study  by  citing 
the  findings  of  Beck  and  Wood  (1976)  which  suggested  p^rsoaality  differences 
in  p-?  rple  who  exhibit  exploratory  behavior  "mixers"  and  those  who  stay  close 
to  a  known  place  in  a  novel  environment,  "fixers"  which  /.ould  account  for 
differences  observed. 

The  interpretation  of  personality  or  innate  differences  in  the  subjects 
rather  than  simple  learning/experimental  differences  between  good  sense-of- 
direction  people  and  those  of  poor  sense-of-direction  can  be  supported  from 
the  literature.     Tryon  (1939)  conducted  a  series  of  experiments  on  maze 
"bright"  and  maze  "dull"  rats  and  concluded  that  sensory  abilities  or  simple 
learning  could  not  account  for  the  observed  differences  in  the  rats.  Tryon 
proposed  the  hypothesis  that  good  maze  learners  were  better  at  developing 
directional  sets  than  poor  maze  learners.    This  supports  the  view  that  high- 
level  cognitive  processes  rather  than  simple  learning  may  account  for 
differences  i:    ^ood  and  poor  sense-of-direction  people. 

The  Field  Artillery  School  (FAS)  at  Fort  Sill  as  a  result  of  the  Human 
Engineering  Laboratories  analysis  of  indirect  fire  systems,  previc-v^ly 
cited,  attempted  a  further  analysis  of  f(/ward  observer  performance  (ACN 
32750,  1977,  WSTEA  Phase  la).    The  FAS  used  a  comparison  of  two  data  groups, 
one  consisted  of  data  gathered  from  officer  basic  classes  and  the  other  was 
composed  of  artillery  officers  from  field  units.    Evaluation  of  the  institu- 
tional data  consisted  of  target  location,  and  observed  fire  scores  correlated 
with  map  reading  scores,  number  of  shoots,  and  nonverbal  tests.  Significant 
correlations  were  found  among  all  variables  except  target  location  and 
observed  fire  scores  and  target  l/^r^^tion  and  number  of  shoots.    These  results 
should  be  accepted  with  cautior,       vever,  due  to  the  fact  that  large  sample 
sizes  such  as  this  (N=1281)  insu.i  flat  even  very  small  correlations  will 
be  statistically  significant  regardless  of  the  meaningf ulness  of  such  corre- 
lations. 

The  field  test  (N=45)  analyzed  self-location,  target  location  and  shoot 
scores  in  relation  to  map  reading  scores,  previous  institutional  shoot 
scores,  visual  acuity,  depth  perception,  nonverbal  tests,  and  number  of 
practice  missions.     Correlational  analysis  revealed  that  only  two  pairs  of 
the  variables  were  correlated  at  a  significant  level,  these  were:     the  non- 
verbal tests  with  self-location,  and  map  reading  scores  with  field  shoot 
scores.    The  fact  that  so  few  relationships  were  found  to  be  significant  is 
surprising  but  must  be  considered       li^nt  of  rather  severe  nethodological 
problems  reflected  in  the  study.     ;    '.hough  the  FAS  study  failed  to  show  a 
significant  relationship  between  t    jet  location  error  and  observed  fire 
scores  the  study  concluded  that  accurate  target  location  ability  was  the 
primary  shortcoming  of  the  forx>rard  observer. 
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Based  upon  these  results  the  FAS  conducted  an  additional  study  to 
analyze  the  effect  of  doubling  the  amount  of  mnp  reading  instruction 
given.    Comparison  between  groups  of  students  who  had  their  length  of 
nap  reading  instruction  doubled  to  that  of  control  groups  revealed  no 
significant  differences  between  the  groups.     (WSTEA  Phase  Ic,  undated) 

The  studies  reviewed  here  are  suggestive  of  differences  among  indi- 
viduals In  spatial  orientat  ion y  self — locat ion ,  and  target  location 
abilities.     Spatial  orientation  abilities  vary  with  self  estimates  of 
spatial  orientation  ability  and  are  related  to  later  performance  on 
orientation  tasks.    Experience  and  training  may  be  related  to  orientation 
performance  but  as  of  yet  have  not  been  clearly  demonstrated  in  the 
research.    All  the  studies  reviewed  here  have  strongly  suggested  the 
presence  of  personality  and/or  innate  differences  which  may  account  for 
differences  in  performance. 

The  purpose  of  the  study  reported  here  was  to  gather  additional 
empirical  data  on  a  limited  part  of  spatial  orientation  abilities.  Par- 
ticularly, the  researchers  sought  information  as  to  the  relationships 
or  differences  aiuong  individuals  on  self-location  abilities  and  directional 
orientation  abilities.     Significant  findings  of  relationships  between  these 
two  variables  were  sought  by  the  researchers  as  an  important  starting  point 
or  pilot  study  for  larger  and  more  comprehensive  research  designs. 

METHOD 

The  experimenters  used  a  one-way  analysis  of  variance  design  in  which 
huaan  observers  (N=30)  were  divided  into  categories  of  either  high  or 
low  self-location  abilities  (median  split)  on  a  previously  administered 
practical  exercise  in  which  the  observer  was  required  to  locate  his 
geographical  position  in  relation  to  his  position  on  a  military  map» 
The  experimenters  then  measured  the  subjects'  ability  on  three  tasks: 
(1)  use  of  a  pointing  instrument  to  point  the  direction  to  a  series  of 
locd\  landmarks  familiar  to  the  subjects,  (2)  use  of  a  pointing  instrument 
to  point  to  a  series  of  cities  within  the  United  States,  and  (3)  the 
subjects  were  tested  with  a  visual  imagery  exercise  which  required  the 
subjects  to  mentally  follow  a  complex:  set  of  directions  and  then  report 
the  direction  they  were  facing  at  the  conclusion  and  at  various  points 
of  the  exercise. 

SUBJECTS 

Subjects  were  30  male  student  officers  from  aiv  officer  basic  class 
at  the  Field  Artillery  School  at  Fort  Sill,  Oklahoma.    Ail  students 
had  completed  forward  observer,  and  related  subject  course  areas  at  the 
time  of  testing.    Self-location  scores  (percentage  correct)  were  rank 
ordered  for  all  118  stu.lents.    Each  student  was  assigned  a  number  and 
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15  students  were  randomly  selected  from  the  top  half  and  15  from  the 
bottom  half  (median  split)  of  the  class. 

APPARATUS  AND  MATERIALS 

Two  test  instruments  were  used  in  this  study.    The  first  instrument 
was  a  38  inch  diameter  circular  piece  of  plywood  which  could  be  situated 
on  a  flat  table.    The  outer  edge  of  this  circle  had  painted  the  6400  mils 
of  a  military  compass  in  10  mil  increments.    Mils  were  used  in  this 
research  since  this  is  the  measurement  unit  used  on  military  compasses 
and  can  be  easily  converted  to  degrees.     The  center  of  this  circle  had 
a  rotating  post  with  a  38  inch  pointer  which  could  be  pointed  in  any 
direction  and  the  direction  read  in  mils  off  the  circular  base.  Subjects 
were  individually  tested  in  a  lighted  but  enclosed  room  by  showing  them 
the  correct  direction  to  true  north  with  the  mils  and  the  pointer  cor- 
rectly oriented.     Each  subject  was  then  asked  to  move  the  pointer  as  close 
as  possible  to  the  actual  direction  of  six  local  areas  in  which  the  student 
had  frequent  contact  i.e.,  student  mail  room,  post  exchange,  etc.  Appendix 
A  contains  a  scoring  guide  of  all  locations  and  their  correct  directions. 
The  subjects  were  also  required  to  point  the  direction  to  si:c  cities  using 
the  pointing  instrument  thereby  providing  measures  of  both  local  and 
national  geographical  orientation. 

The  second  test  instrument  used  in  this  study  was  a  nental  imagery 
exercise  consisting  of  a  single  sheet  shown  to  the  subjects  with  square 
grids  covering  approximately  two-thirds  of  the  page.     Individual  subjects 
were  asked  to  close  their  eyes  and  Imagine  themselves  at  the  top  of  the 
series  of  squares  or  grids  facing  a  specified  direction.     They  were  then 
asked  to  imagine  themselves  walking  along  the  grid  lines  in  whatever 
direction  and  for  whatever  distance  the  experimenter  instructed,  then 
at  various  points  along  this  path  they  were  asked  what  direction  they 
were  facing.     Each  subject  completed  three  of  these  mental  imagery 
exercises.     Instructions  with  the  plotted  paths  for  each  of  the  three 
exercises  are  presented  in  Appendix  B  to  this  paper. 

PROCEDURE 

Subjects  v;ere  randomly  selected  for  each  of  the  two  groups  as  pre- 
viously described  and  ran  individually.     The  experimenter  briefly  described 
the  study  to  each  subject  and  obtained  informed  consent.     Then  each  subject 
was  taken  into  a  lighted  room  where  the  pointing  instrument  was  located. 
There  was  no  attempt  to  eliminate  directional  visual  cues  within  the  room. 
The  subject  was  shown  the  operation  of  the  pointing  instrument  and  then 
the  instrument  pointer  was  placed  on  true  north  and  the  subject  asked  to 
point  to  the  previously  described  locations. 
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RESULTS 


LOCAL  POINTS 

One-way  analysis  of  variance  was  used  to  evaluate  the  group  differ- 
ences in  pointing  to  six  local  areas  with  which  the  subjects  had  daily  to 
weekly  contact.    Absolute  error  scores  measured  in  nils  fron  the  actual 
azimuth  measured  from  true  north  were  used  in  this  analysis  as  the 
dependent  variable.    Group  assignment  was  the  independent  variable  with 
group  one  consisting  of  subjects  who  had  scored  above  the  median  on  a 
field  self-location  test  and  group  two  consisting  of  these  who  had  scored 
below  the  median  on  the  same  self-location  test*    Table  1  presents  the 
results  of  this  analysis. 

Group  one  (high  self-location  scores)  performed  significantly  (?  ^  A 
better  than  group  two  (low  self-location  scores)  on  pointing  to  local 
points  as  was  expected.    Table  2  presents  the  means,  standard  deviations 
and  errors  for  these  two  groups. 


Insert  Table  1  and  2  about  here 


As  can  be  seen  from  these  tables  the  relative  difference  is  rather  small 
when  the  mils  are  converted  to  degrees  (approximately  15°  error  for  group 
oi;e  and  18    error  for  grot\p  two).    Although  this  is  a  relatively  small 
difference  this  data  provides  evidence  as  to  the  utility  of  a  pointing 
instrument  in  differentiating  between  high  and  low  scorers  in  self- 
location  tasks  . 

DISTANT  CITIES 

One-way  analysis  of  variance  as  previously  described  in  the  analysis 
of  local  points  was  used  to  analyze  the  differf^nces  in  groups  for  pointing 
to  distant  cities.    The  results  of  this  analysis  are  presented  in  Tables 
3  and  4. 


Insert  Table  3  and  4  abs..Mt  here 


As  in  the  previous  analysis,  significant  differences  were  obtained  between 
groups  (p  <  .03)  on  pointing  to  distant  cities.    Again  exandnation  of  the 
results  of  the  analysis  of  variance  and  means,  SD,  and  SE  reveal  the  point- 
ing instrument  was  effective  in  differentiating  between  groups. 
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VISUAL  IMAGERY 


The  third  analysis  as  in  the  first  and  second  revealed  significant 
differences  (p  ^  .002)  between  the  two  groups  on  the  visual  imagery  tasks. 


Insert  Table  5  and  6  about  here 


As  can  be  seen  from  an  examination  of  Tables  5  and  6  the  visual  imagery 
task  produced  what  appears  to  be  the  greatest  magnitude  of  differences. 

CONCLUSION 

The  purpose  of  this  study  was  to  examine  relationships  among  self- 
location  abilities  and  perfomnance  on  an  orientation  task  requiring 
estimates  of  compass  directions  and  geographical  spatial  orientation 
using  visual  imagery.     The  results  of  the  prelj.miv)ary  research  have 
clearly  demonstrated  that  differences  between  high  scores  and  low  scor^v.s 
on  a  self-location  test  can  be  differentiated  by  use  of  a  simple  point- 
ing instrument  and  visual  imagery  task.    The  results  although  promising 
must  be  accepted  with  caution  due  to  the  relatively  small  sample  size^ 
lack  of  biographical  data  on  subjects,  lack  of  test  retest  reliabilities 
using  the  instruments,  contamination  of  the  criterion  variable,  relaf5.ve 
little  variation  in  the  criterion  variable,  and  ether  uncontrolled 
variables  which  may  impact  upon  spatial  orientation  and  self-location 
skills  which  were  not  included  in  this  pi.lot  research.    These  same  cautions, 
however,  provide  the  foundation  for  an  expanded  investigation  in  which  a 
multivariate  statistical  Jasign  will  allow  for  greater  control  of  variables 
and  analysis  of  their  contributions  to  performance  in  self-location  and 
target  location  abilities. 
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TABLE  1^ 

Analysis  of  Variance  of  Mean  Errors  in  Pointing  to  Local  Points 

for  Groups  1  and  2^ 


Source 

SS 

df 

MS 

F 

Between  Groups 
Treatment 

34884 

1 

34884 

4.60^^ 

Within  Groups 
Error 

212431 

28 

7587 

Total 

247315 

29 

Note.     N=30;  15  per  group.     Numbers  rounded  to  nearest  whole  number. 

^nit  of  measure  is  in  Mils  with  6400  mils  =  360°. 
Group  1  =  Subjects  scoring  above  median  on  self-location  test. 
Group  2  -  Subjects  scoring  below  median  on  self-location  test. 
p<  .04 


TABLE  2^ 

Means  and  Standard  Deviations  and  Errors  for  Groups 
on  Pointing  to  Local  Points 


Group 

Mean 

SD 

Standard  Error 

1 

264 

67 

17.32 

2 

332 

103 

26.68 

Total 

298 

92 

16.86 

Note. 

N^ 

=30;  15  per 

group. 

Numbers  rounded 

to  nearest  whole  number. 

^nit 

of 

measure  is 

in  Mils 

with 

6400  mils  = 

=  360°. 

Group 

1 

-  Students 

scoring  abo 

median  on 

self-location 

test. 

Group 

2 

=  Students  scoring  below  median  on 

se] f-location 

test. 
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TABLE  3^ 

Analysis  of  Variance  of  Mean  Error  in  Pointing  co  Distant  Cities 

for  Groups  1  and  2^ 


Source 


SS 


Between  Gr  ups 
Treatment 

Within  Groups 
Error 

Total 


150946 
778886 
929832 


df 


28 


MS 
150946 

27S17 


5.43^ 


29 


Note,    N=30;  15  per  group.    Numbers  rounded  to  nearest  whole  number. 

^^nit  of  measure  is  in  Mils  with  6400  mils  =  360*^. 

Group  1  =  Students  scoring  above  median  on  self-location  test. 

Group  2  ~  Students  scoring  below  median  on  self-location  test. 
.04 


TABLE  4^ 

Means  and  Standard  Deviations  and  Errors  for  Groups 
on  Pointing  to  Distant  Cities 


Group  Mean  SD  Standard  Error 


1 

366 

83 

21.49 

2 

507 

221 

56.98 

Total 

437 

179 

32.69 

Note.  K= 

-30;  15  per 

group. 

Numbers  rounded 

to  nearest  whole  number. 

j^Lnit  of 
Gro-jp  1 
Group  2 

measure  is 
=  Students 
=  Students 

in  Mils 
scoring 
scoring 

with  6400  mils  = 
above  median  on 
below  median  on 

=  350°. 

self-location  test . 
self-location  test. 
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TABLE  5^ 

Analysis  of  Variance  of  Scores^  Obtained  on  Visual  Imagery  Test 

for  Groups  1  and  2^ 


Source 

SS  df 

MS  F 

Between  Groups 
Treatment 

2484  1 

2484  11.73^ 

Within  Groups 
Error 

5933  28 

212 

Total 

8417  29 

Note.    N=30;  15  per 

group .    Numbers  rounded 

to  nearest  whole  number • 

^Scores  represent  percent  correct 
Group  1  =  Subjects  scoring  above  median  on 
Group  2  =  Subjects  scoring  below  median  on 

%<  .002 

self- location  test, 
self-location  test. 

TABLE  6^ 

Means  and  Standard  Deviations  and  Errors  for  Groups 

on  Visual  Imagery  Test 

Group^ 

Mean  SD 

Standard  Error 

1 

90  12 

3.05 

2 

72  17 

4.36 

Total 

81  17 

3.11 

Note.    N=30;  15  per 

group.    Numbers  rounded 

to  nearest  whole  number. 

funit  of  measure  is  in  Mils  with  6400  mils  = 
Group  1  =  Students  scoring  above  median  on 
Group  2  =  Students  scoring  below  median  on 

=  360°. 

self-location  test, 
self-location  test. 
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APPENDIX  A 
POINTING  INSTRUMTNT  SCORING  GUIDE 


Location  Name  Location  Azimuth 


1. 

Officers  Club 

5855 

2. 

Main  PX 

5075 

3. 

Ft  Sill  Blvd  Exit 

3610 

4. 

Key  Gate 

2490 

5. 

Mail  Roon 

1825 

6. 

CF  Department 

4900 

7. 

Oklahoaa  City 

0710 

8. 

Kew  Orleans 

2150 

9. 

Dallas 

2550 

10. 

Houston 

2670 

11. 

Kansas  City,  MO 

0520 

12. 

Denver 

5610 
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APPENDIX  B 
VISUAL  IMAGERY  EXERCISE 
NARRATIVE  INSTRUCTiONS  GRID  #1 

1.  Graphic  Representation:    See  Attached  Sheet 

2.  Scoring  Procedure:    Score  one  point  for  each  correct  direction  given 
by  the  subject.    Ask  the  subject  for  his  direction  at  each  place 
Indicated  in  the  narrative. 

3.  Narration: 

a.  Close  your  eyes  and  imagine  yourself  facing  South  on  the  grid 
previously  shown  to  you. 

b.  Proceed  two  blocks  South,  Stop. 

c.  Turn  90°  left,  now  proceed  two  blocks  and  Stop. 

What  direction  are  you  now  facing?     (Correct  answer  is  East) 

d.  Now  turn  left  90°  and  proceed  two  blocks,  Stop. 

e.  Turn  left  90°  and  proceed  two  blocks  and  Stop. 

What  direction  are  you  now  facing?     (Correct  answer  is  West) 

If  the  subject  correctly  answers  both  questions  score  2  for 
this  example. 

4.  Now  give  the  subject  a  blank  grid  and  ask  him  to  draw  the  directions 
he  followed  in  this  example. 

5.  Ask  the  subject  for  any  questions  to  clarify  the  procedure. 

6.  Proceed  to  the  next  exercise  if  the  subject  understands  the  directions. 
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NARRATIVE  INSTRUCTIONS  GRID  #2 

1.  Close  your  eyes.    Imagine  yourself  facing  South  on  the  grid  you  were 
just  shown. 

2.  Proceed  one  block  and  Stop. 

3.  Turn  90^  left,  walk  one  block  md  Stop. 

4.  Turn  90°  right,  walk  one  block  and  Stop. 

What  direction  are  you  now  facing?     (A3,  South) 

5.  Turn  right  90°  proceed  one  block  and  Stop. 

6.  Turn  right  again  90°  proceed  one  block  and  Stop. 
What  direction  are  you  now  facing?     (A5,  North) 

7.  Turn  right  90°  proceed  one  block  and  Stop. 
What  direction  are  you  facing?     (A6,  East) 

3.    Turn  left  90°  proceed  one  block  and  Stop. 

9.  Turn  left  90°  proceed  one  block  and  Stop* 
What  direction  are  you  now  facing?     (A8,  West) 

10.  On  this  blank  grid  page  draw  the  route  you  have  been  following. 
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MRRATIVE  INSTRUCTIONS  GRID 

1.  Close  your  eyes.     Isiagine  yourself  facing  East  on  the  grid  you  were 
just  shovm. 

2.  Proceed  two  blocks  and  Step. 

3.  Turn  right  90°,  new  turn  45°  more  to  the  right  and  proceed  two  blocV^ 
and  Stop« 

What  direction  are  you  now  facing?    (A2,  SW) 

4.  Turn  left  90° p  now  turn  45°  more  to  the  left  and  proceed  two  blocks 
and  Stop. 

What  direction  are  you  now  facing?     (A3,  E) 

5.  Turn  left  180°  then  turn  right  45°. 

What  direction  are  vou  now  facing?     (A4,  NVJ) 

6.  Proceed  two  blocks  ia  this  directi::;n  and  Stop.     Turn  left  45°. 
What  direction  are  you  now  facing?  (W) 
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ESTABLISHING  THE  PROGRAM 


Prior  to  detailing  the  application  of  job  analysis  techniques  to 
design  of  enlisted  medical  training,  it  would  appear  appropriate  to  ov  - 
line  how  and  why  the  Army  Medical  Department  came  to  use  the  Instruc- 
tional Systems  Development  technology  which  includes  the  use  of  job 
analysis . 

As  has  been  the  case  with  its  sister  services,  the  U.S.  Army  hs,, 
for  many  years,  been  under  the  scrrrrfny  of  the  Congress  and  tihe  Fedeari. 
Executive  Branrii..    The  focus  of  th±s  scrutiny  has  been  an  efSart  to 
restructure  the  t:raining  establishnEent  with  the  imtent  of  OEanging  tJi* 
student  to  staff  ratio  and  to  make  more  personneL  available  for  assign- 
ment to  combat  units.    There  were     number  of  ancfJ,' ary  issues  raiser  I  , 
two  of  which  were  the  methods  of  instruction  and  the  cost  of  -he  rraiL  "  zg 

The  impact  of  the  Congressional  concern  was  expressed  in  a  legis  — 
tive  amendment  to  the  FY  76  Defense  Authorization  Bill  (House  Heport 
94-413)  which  mandated  a  study  of  DOD  training  establishments.  The 
effect  of  this  legislation  on  the  U.S.  Army  was  tn  force  the  Issues  ^' 
modernizing  training  procedures,  streamlining  training  structures-,  aaad 
minimizing  training  fund  expenditures.    During  the  same  year,  the  DOD  ttp 
its  Report  of  Training  to  the  Congress,  endorsed  a  new  training  dfiveV^p- 
ment  model,  the  Instructional  Systeais  Development  (ISD)  approach,  wb^i  ' 
subsequently  was  adopted. 

The  ISD  philosophy  was  to  implement  training  based  upon  tasks  the 
trainee  would  subsequently  perform  on  the  job.    This  philosophy  pla^^d 
the  Army  Medical  Department  and  the  Academy  of  Health  Sciences  in  s 
dilemma.    The  Academy  of  Health  Sciences  was  committed  to  enlisted 
technical  training  based  on  the  traditional  model  of  education.  (T5v 
Academy  of  Health  Sciences  is  the  Anny  Medical  Department's  only  fo? 
school  with  a  staff  and  faculty  of  approximately  2150,  a  resident  s 
population  of  more  than  33,000  annually,  and  over  30,000  nonresider 
students  enrolled  in  extension  courses.) 

To  further  complicate  this  dilemma,  two  other  problems  arose.  - 
first  of  these  problems  lay  in  the  area  of  gathering  sufficient  expw- 
to  implement  the  new  training  philosophy,  task  based  training,  whil 
continuing  the  on-going  training  mission  to  support  the  needs  of  tt  ly 
The  second  problem  was  the  resistance  of  a  largely  successful  orgar„ 
tion  to  a  basic  change  in  both  philosophy  and  organization. 

The  final  catalyst  for  this  monumental  change  in  philosophy  ai:^ 
method  of  operation  was  the  assignment  of  a  new  Superintendent  to 
Academy  of  Health  Sciences.    The  arrival  of  a  new  commander,  with  ^  ^ne 
pragmatic  approach,  resulted  in  considerable  acceleration  of  the  ch^do^ 
process  and  provided  guidance  in  terms  of  product  oriented  directinn  ^rx^th 
a  rigid  timetable. 

With  the  philosophical  decisions  made,  the  next  probHem  was  to 
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InijilrM^  iir  the  ISE  aajoaaaach.     In  order  to  accomplish  this,  a  task  force 
was  efi5sabl±shed  «^  -rh  crew  upon  the  talents  and  resources  available  with- 
in thie  Academy  nf  H€ai±rii  Sciences.    T&ere  were  two  basic  problems  in 
const  i^ciring        Tlfflk  Imrce.    The  fir^-  was  the  necessity  to  continue  the 
rrad±rmial  tra^a^  program,  thus  soMwhat  restricting  the  personnel  who 
^uM       remaggfeJ--^ ^^jm  their  teaching  oir  administra^-ive  positions.  The 
second  zrrofalcBi  jx^^  U  taking  the  ISD  directional  pantdiLets  which  were 
largely  «iphit:aw^'^hical  approach  to  de^loping  task  basHi  traini^  and 
3onTCrt=xrg  th^e      iosnphy  into  a  pragmai:ic  product  oriHrrced  mode  of 
rpesstiiisi-. 

Tha^  -am:  pTwbUms  were  solved  by  J£tailing  a  ranateer  of  ^4  hly 
^ucated  rmrssmn^"^  Co  the  Task  Force  axat  allowing  t-^  group  af^oslmately 

hirty  daer*^  ts:  tt*»crT2w^ily  review,  digesa:,  and  educaiiE  cham^l^^e^  as  to 
:2ce  vagarl^-'i0  apisHrEnting  the  ISD  philosophy.    The  f^yelf^ediucaition 

rTTDcess  it.  rliMtec  r  rer^w  of  Congressiona-i  hearings  amd  dociaaeo-uS,  techni- 
mat2B5rLals  fTvnm:ie  ISD  model  and  tfe  history  of:  tine  ISD  F  rocess.  By 
l«o«ember  1976    m=£i^™3abeT  of  the  Task  Torce  had  an  ow7r»-     of -how  his 
sSbrts  wr  ild  fit   into  the  total  ISD  picture.    At  thaxr  :3id1i     tiEu  Task 
FcTxce  eatea^rkec   -    its  first  ISD  effort,  to  establish  a  anxt  test  the 

pJisn  by  devel-^pi  zg  a  single  course  of  mstruction.  The  .^^^  lialry  chosen 
±ryz  the  itj  itia^  trrlal  effort  was  the  Medical  Specialist,  ^tSS  913,  with  a 
:s:-j:get  dia'  2  ^-r  cc  urse  validation  of  October  1977. 

Sow',  to  mowe  from  the  history  of  the  establishing^  the  ISD  method 
traimJjJ4£  deve^Mpment  to  the  initiation  of  the  .iob  aa^ysis  efforts,  in 
A^-ii  :'!rr7.  the   vtodemy  of  Health  Sciences  obtained  a  OBalif ied  job 
ffijalyst.     Tfepir--.  the  fact  that  this  occured  considerably  after  the 
izzcLtiatlOTi  of'th.  Medical  Specialist  ISD  effort,  the  jcrb  analysis  proce- 
cnnre  ij^s  begun,    ^e  results  were  to  serve  two  purposes;  first,  to 
*^ablijri  lines    f  communications  with  the  Army  Occ-rjational  Survey 
r-ogran  data  base  at  the  U.S.  Army  Military  Personne:.  Center  (MILPERCEN) 
lETAleHWJsdria,   Virginia  and  second,  to  validate  the  efforts  of  the  Task 
Force  ivm  estabLLshing  a  task  based  training  package. 

?  VBonnel  in  the  Military  Occupational  Data  Division  of  MILPERCEN 
.re  exccremely  ccraperative  in  allowing  the  establishanent  of  informal 
nes  ocf  communications  and  providing  data  as  rapidliy  as  their  system  and 
♦a  U.S.  Postal  Service  would  allow.    The  Academy  of"  iealth  Sciences  was 
^tun&te  in  having  coordinated  with  the  Military  Ocxnpational  Data 
"t^siisn  during  the  period  of  1976  and  early  1977  in  the  construction  of 

inwentory  questionnaires  for  most  of  the  medical  specialties.  The 
*HKa  to  suppcrrt  the  ISD  efforts  had  been  gathered  fron  September  1976 
-^~^<m^  April  1977  and  much  of  the  data  was  available  for  processing. 
15,'*- job  analysis  of  the  Medical  Specialist,  MOS  91B,  began  in  May  of  1977 
ate;,  a  final  occupational  survey  report  was  published  1..  September  of  1977. 
Iii^action  1«tween  the  job  analyst  and  the  other  members  of  the  Task 
Fossgre  led  to  consideration  of  survey  findings  in  the  ^elopment  of  the 
new? 'Medical  Specialist  course. 
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By  October  1977,  the  Task  Force  had  esial^i^ished  a  plan  for  imp^ementiing 
ISD  methodology  for  course  ^ievelopment  at  t±ie    cademy  of  Keslth  Scr=^ces. 
The  plan  had  been  tested  In  the  development  of  the  course  of  instr:^tion  for 
the  Medical  Specialist  MG£  ^id  a  recommendation:  was  made  and  apprraed  -ro 
formalize  the  organizatiox   and  methodology  and  continue  wdth  the  r^^roiTting 
thirty  specialties  and  ass^ociated  courses  of  instruction. 


357 


JOm  ANALYSIS  FINDINGS 


Since  the  initiation  of  the  Instructional  SystaaB  Development  process 
at  the  AcadEny  of  Health  Sciences,  occupational  sunesys  have  been  com- 
pleted for  iiine  specialties.    These  specialties  arez 

Medical  Specialist,  MOS  91B 

Medical  Supplyman,  MOS  76J 

Hospital  Food  Service  Specialist,  MOS  94F 

Veterinary  Specialist,  MOS  91R 

Behavioral  Science  Specialist,  MOS  91G 

Patient  Administration  Specialist,  MOS  71G 

X-Ray  Specialist,  MOS  91P 

Clinical  Specialist,  MOS  91C 

Operating  lioom  Specialist,  MOS  91D 

la  addition,  analysis  of  the  occupational  data  ir^r  two  other  special- 
ties are  in  progress.    These  are  the: 

Medical  Laboratory  Specialist,  MOS  92B 

Dental  ilemovable  Prosthetic  Specialist,  MOS  '*.2I> 

In  an  attempt  to  illustrate  the  utility  of  th,e  occupational  survey 
data,  selected  findings  will  be  briefly  discussed.    One  of  the  most 
valuable  contributions  that  an  occupational  survev  can  provide  to  tiie 
training  development  process  is  the  identification  of  the  different  jobs 
which  exist  within  each  specialty  and  the  tasks  sndividuals  performj^hen 
accomplishing  those  jobs.    For  example,  the  job  structure  analysis  ^or  the 
Hospital  Food  Service  Specialist,  MOS  94F,  occupational  survey  indicated 
the  existance  of  eleven  different  jobs  vlthin  the  specialty.    The  eleven 
different  jobs  could  be  grouped  together  to  form  two  large  clusters  of 
jobs  and  two  smaller  separate  job  classifications.    Personnel  in  one  of 
the  large  job  clusters,  titled  Food  Prepavation  Specialists,  wh?>  repre- 
sented 53  percent  of  the  sample,  performed  tasks  virtually  idevt'  al  to 
those  performed  by  another  specialty,  the  Food  Service  Special^Ji;,  MOS 
94B.    On  the  basis  of  this  information,  coupled  with  additional  data, 
consideration  is  being  given  to  consolidating  the  food  preparation  phase 
of  training  for  the  two  specialties  at  a  single  location  with  an  addi- 
tional period  of  training  provided  for  Hospital  Food  Service  Specialists 
in  the  areas  of  their  specialty  peculiar  to  the  hospital  environment. 

A  second  example  of  the  utility  of  the  job  structure  information 
occurred  in  the  Patient  Administration  Specialist,  MOS  71G,  occupational 
survey.    The  job  structure  analysis  identified  eleven  separate  jobs  within 
the  specialty.    A  number  of  these  jobs  were  found  to  be  performed  by 
personnel  in  their  second  or  subsequent  enlistments.    In  reaction  to  this 
information,  the  task  analysis  team  is  recommending  that  training  in  these 
areas  be  given  at  some  time  other  than  in  the  initial  resident  course.  If 
such  a  recommendation  is  approved,  there  could  be  a  significant  savings  in 
training  funds. 
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However^  ^1*  occupational  survey  information  must  be  considered  only 
as  a  point  of  (ife^^crture.    Prior  to  the  Implementation  of  any  recommenda- 
tions from  an  otrr  r?.natlonal  survey,  information  related  to  many  other 
factors  must  Ibe  ^xsasldered.    Some  of  these  factors  are  overall  contributions 
to  unit  mlssicsnu  ::=ise  ability  of  the  individual  to  perform  collective 
tasks,  and  the  innsact  on  the  individual's  ability  to  expand  his  base  of 
knowledge. 

A  second  .sMct  of  occupational  survey  information  which  Impacts  on 
training  decESslDiE  relates  to  the  probability  that  an  individual  will 
perform  a  taskL.    As  an  example,  in  the  Medical  Supplyman  (MOS  76J)  occupa- 
tional survew,  t±fire  were  very  few  tasks  performed  by  large  percentages  of 
survey  respoiudeaits.    The  inventory  questionnaire  included  a  total  of  392 
task  statemettts  wiiich  was  a  reasonably  comprehensive  list.    The  average 
number  of  ta^cs  performed  by  any  one  respondent  was  55,  with  the  average 
dropping  to  46  tasks  when  the  data  base  was  restricted  to  those  in  their 
first  enlisti«nt  (the  target  for  the  initial  resident  training  course). 
This  informalrioiL,  when  considered  in  concert  with  two  other  facts;  (1) 
there  were  tfirree  tasks  performed  by  a  least  half  of  the  target  population; 
and  (2)  theres  were  an  additional  sixteen  tasks  performed  by  at  least 
one-third  of  the  target  population;  led  to  the  conclusion  that  a  task 
based  cost-effective  training  course  would  be  difficult  to  develop. 

A  second,  example  of  the  Impact  of  task  performance  data  on  training 
development  came  from  the  occupational  survey  of  the  Medical  Specialist, 
MOS  91B.    The  survey  data  yielded  a  rather  broad  base  of  tasks  which  would 
be  appropriate  for  inclusion  in  an  initial  resident  training  course. 
There  were,  however,  two  substantial  problems  with  this  information:  (1) 
Many  of  the  tasks  which  were  performed  by  personnel  at  that  time  were  not 
the  ones  which  would  be  required  to  be  performed  should  the  individual  be 
placed  in  a  hostile  environment  (because  the  Medical  Specialist,  MOS  91B, 
is  the  individual  commonly  referred  to  as  the  Combat  Medic;  and  (2)  many 
of  the  tasks  performed  by  individuals  are  not  appropriate  to  Include  in  a 
specialty  training  course  (these  are  primarily  those  tasks  related  to 
vehicle  maintenance,  a  responsibility  inherent  in  the  job  of  a  soldier). 

Another  illustration  of  the  impact  of  occupational  survey  information 
is  the  discovery  of  the  unpopular.    These  are  findings  which  may  be 
illustrated  by  the  following  examples.     In  the  Clinical  Specialist  (MOS 
91C)  occupational  survey,  the  job  structure  analysis  identified  a  small 
job  group  (representing  approximately  four  percent  of  the  population) 
where  the  personnel  were  performing  tasks  which  were  the  same  as  those 
performed  by  a  relatively  large  job  group  in  the  Medical  Specialist,  MOS 
91B,  occupational  survey.    This  was  an  unpopular  discovery  because  the 
Clinical  Specialist,  MOS  91C,  receives  approximately  one  year  of  training 
while  the  Medical  Specialist,  MOS  91B,  receives  approximately  twelve  weeks 
of  training.    Another  example  of  an  unpopular  finding  occurred  in  the 
Medical  Supplyman,  MOS  76J,  occupational  survey.    The  survey  data  revealed 
a  differential  utilization  pattern  between  the  male  and  female  survey 
respondents.    The  male  respondents  performed  shipment  and  storage  tasks  to 
a  much  greater  degree  than  the  female  respondents,  who  performed  adminis- 
trative supply  tasks  to  a  substantially  greater  degree. 
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A  final  Illustration  of  the  impact  of  occupational  survey  data  lies 
in  the  discovery  that  a  specialty  can  be  appropriately  described  and  that 
training  prepares  the  individual  to  perform  his/her  job.    The  discovery 
that  all  is  reasonably  well  within  a  specialty  is  too  often  dismissed 
while  a  discovery  that  something  is  wrong  or  in  error  is  trumpeted  out  of 
proportion.    This  impact  of  the  occupational  survey  information  is  as 
important  as  any  other  impact  and  perhaps  the  most  overlooked.    In  addi- 
tion, conducting  an  occupational  survey  leading  to  the  conclusion  that  all 
is  well  is  often  not  very  exciting.    The  findings,  for  example,  that  the 
Veterinary  Specialist  (MOS  91R)  has  a  broad  and  complex  job,  which  included 
conducting  the  food  inspections  for  all  Army  installations  under  a  myriad 
of  regulations  and  guidelines,  was  not  new  to  anyone.    The  finding  that  an 
X-ray  Specialist,  MOS  91P,  must  be  trained  to  perform  a  wide  range  of 
different  radiographic  tests  was  a  well-known  fact  prior  to  the  completion 
of  the  occupational  survey.    However,  what  is  important  is  that  after  the 
completion  of  the  occupational  survey,  the  feelings,  intuitions,  and  pre- 
conceived notions  can  be  validated  and  the  training  programs  can  be  based 
on  empirically  substantiated  information. 
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NOW  AND  THE  FUTURE 


The  Implementation  of  this  new  method  of  training  development  is  well 
underway.    To  the  present,  job  analysis  has  been  accomplished  for  eleven 
specialties.    Task  analysis  has  been  completed  for  five  of  these 
specialties  and  is  in  progress  for  an  additional  three.    A  new  course  has 
been  designed  and  tested  for  one  specialty.    The  program  is  clearly  still 
in  its  infancy.    With  the  development  of  this  new  approach  to  course  con- 
struction have  come  many  problems,  two  of  which  will  be  discussed. 

One  of  the  maior  areas  of  concern  with  the  new  approach  is  the  rela- 
tionship between  the  "what  is,"  as  represented  by  the  occupational  survey 
information,  and  the  "what  may  be,"  when  personnel  must  perform  in  a 
hostile  environment.    Directly  related  to  this  concern  is  the  fact  of 
dealing  with  the  distinctly  unique  requirements  of  the  medical  community. 
The  concept  of  the  "critical  task"  takes  on  a  very  real  meaning  in  a 
medical  emergency.    Training  programs  must  be  designed  to  prepare  the 
individual  to  perform  tasks  for  which  the  probability  of  performance  may 
be  limited.    This  requires  exposure  to  the  task,  not  only  in  the  training 
environment,  but  also  in  some  form  of  continuing  training  beyond  the 
resident  course.    The  use  of  unit  training  and  Training  Extension  Courses 
(TEC)  are  a  partial  answer  to  this  problem. 

A  second  area  of  concern  with  the  new  approach  involves  the  cognitive 
nature  of  many  of  the  tasks  performed  by  medical  personnel.    This  aspect 
of  task  definition  and  performance  became  increasingly  evident  in  the 
development  and  analysis  of  tasks  for  the  Behavioral  Science  Specialist, 
MOS  91G.    Personnel  in  this  specialty  deal  with  individuals  who  have 
problems  coping  with  their  environment  and  manifest  any  number  of  external 
and  internal  abnormal  behaviors.    The  normal  task  analysis  processes 
(standards,  conditions,  cues,  etc.)  were  not  derived  and  they  are  not 
generally  effective  in  dealing  with  tasks  related  to  human  cognitive 
skills.    In  this  area  the  Academy  of  Health  Sciences  is  developing  a 
supplement  to  the  ISD  model  to  aid  in  the  development  of  training  in  the 
area  of  cognitive  skills. 

But  what  does  the  future  hold  for  continued  implementation  of  the  1ob 
analysis  effort  within  the  Army  medical  training  environment.    The  imme- 
diate future  appears  to  be  relatively  well  planned  with  ISD  efforts 
proposed  for  all  of  the  enlisted  medical  specialties.    These  efforts  alone 
will  consume  th^  better  part  of  the  next  three  to  four  years.    In  addition, 
there  are  a  number  of  special  projects  which  illustrate  the  growth  of  the 
ISD  program  in  the  medical  training  community.     Such  special  efforts  are; 
the  development  of  a  pre-command  course  for  medical  command  selectees 
(what  do  medical  commanders  do  and  what  do  they  need  to  know?),  an  attempt 
to  design  a  front  end  analysis  effort  to  facilitate  the  design  of  a  course 
of  training  for  the  Special  Forces  Aidman  (a  distinctly  different  type  of 
medic) ,  the  beginning  of  ISD  efforts  in  the  officer  arena  (a  new  under- 
taking in  the  medical  profession) ,  and  an  assessment  of  the  supervisory 
and  management  skills  required  of  commissioned  and  noncommissioned 
officers. 
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CODAP:    A  NEW  MODULAR  APPROACH  TO  OCCUPATIONAL  ANALYSIS 

By 

Michael  C.  Thew  and  Johnny  J.  Weissmuller 

ItTIRODUCTION 

The  increasing  complexity  of  career  fields  requires  a  corresponding  increase 
in  the  number  of  task  items  within  an  occupational  survey.     Survey  booklets 
containing  800  to  l.ZOO  items  are  not  unusual.     Initially,  the  incumbent  was 
required  to  read  every  task  item  in  orier  to  locate  those  which  were 
relevant  to  the  job.     (Appendix  A)    i^ecause  this  was  an  onerous  chore, 
tasks  were  overlooked  and  the  reliability  of  the  responses  could  be  ques- 
tioned.   By  ordering  the  tasks  on  some  type  of  commonality,  an  organization 
takes  place  which  simplifies  the  identification  of  tasks  by  the  incumbent. 
(Appendix  B)    This  method  of  organizing  the  tasks  by  duties  within  the  job 
inventory  is  widely  used  and  works  well  for  data  collection.    However,  not 
all  users  find  this  organization  useful  when  analyzing  the  data  for  their 
particular  needs.     Recently,  methods  have  been  developed  to  facilitate  the 
reorganization  of  tasks  into  new  categories  called  modules.  Module 
definition  always  occurs  after  the  data  base  has  been  generated  from  the 
survey  instruments. 

DATA  COLLECTION 
Tasks 

Tasks  Within  Duties 


DATA  PRESENTATION 
Tasks 
Duties 

Tasks  Within  Duties 
Modules 


-"■^.OLE  DEFINITION 


The  two  steps  involved  in  creating  modules  are  definition  and  assignment. 
Definition  consists  of  defining  the  attributes  or  rules  for  the  organization 
ot  tasks  into  modules.    Assignment  consists  of  the  application  of  those 
rules  to  the  collection  of  tasks  into  modules.    This  is  usually  done  by  a 
person  who  is  judged  qualified  to  decide  whether  or  not  a  task  meets  the 
requirements  set  by  the  definition;  i.e.,  a  subject  matter  specialist. 
When  these  requirements  are  quantifiable  (measurable  by  a  range  of  numeric 
values)  an  automated  approach  of  combining  the  tasks  may  be  utilized.  Data 
displayed  by  user  defined  modules  may  provide  insights  about  the  survey 
which  are  not  readily  apparent  from  the  original  order.     This  is  an  extension, 
not  a  replacement,  for  the  task,  duty  or  task  within  duty  display  formats. 
The  best  method  is  always  determined  by  how  the  data  are  going  to  be  used 
which  is  an  especially  important  consideration  in  the  module  definition. 
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EXAMPLES 


The  following  examples  illustrate  a  few  applications  of  user  defined 
modules  in  occupational  analysis. 

Example  1;     Relating  Training  Requirements  to  Tasks  Performed.  Suppose 
a  technical  trainer  says,  "Right  now  every  student  is  taught  how  to 
overhaul  engines.    We  hypothesize  that  only  second  term  enlistees  are 
doing  this  job  while  first  termers  merely  assist  with  parts  of  the 
process.     If  this  is  true,  we  could  emphasize  the  training  on  those  tasks 
which  the  first  termers  actually  perform."    What  the  trainer  is  asking  for 
is  a  report  showing  the  percent  of  first  termers  and  the  percent  of  second 
termers  who  overhaul  engines.     Looking  at  the  task  inventory  list,  we  find 
there  are  no  tasks  titled  Overhauling  Engines.     Closer  examination  of  the 
Task  Inventory  list  reveals  that  several  tasks  might  be  associated  with 
engine  overhaul.    At  this  point,  a  subject  matter  specialist  familiar  with 
the  operation  of  overhauling  engines  is  asked  to  identify  which  of  the 
tasks  in  the  inventory  are  applicable.    A  mark  will  be  placed  by  those 
tasks  which  belong  to  the  new  module.     (Appendix  C)     The  tasks  can  now  be 
reorganized  into  a  new  pseudoduty  or  modul*^  labeled  "Overhauling  Engines". 
The  reorganized  report  of  percent  members  performing  data  now  provides  the 
trainer  with  information  necessary  to  make  his  decision.     (Appendix  D)  It 
is  important  to  note  that  instead  of  constructing  and  administering  a  new 
survey,  we  have  decided  only  to  reorganize  the  existing  inventory  in  a 
manner  that  is  acceptable  to  the  user's  needs.    This  approach  reduces  both 
time  and  cost. 

Example  2;     New  Task  Categories  Vs  Time-in-service,     In  another  case, 
someone  might  ask:     "Suppose  we  separate  a  Task  Inventory  into  five  major 
categories  called  Managerial,  Clerical,  Heavy,  Light,  and  Dirty  tasks, 
Could'we  identify  a  relationship  between  time-in-service  and  the  type  of 
task  being  performed?"    Since  there  are  no  duties  with  these  titles,  the 
five  new  modules  must  be  defined.     As  in  example  1,  we  will  use  a  subject 
matter  specialist  to  identify  those  tasks  which  fall  under  the  new  module 
definitions.     (Appendix  E)     Then,  four  additional  categories  will  be 
produced  representing  people  v/ho  have  been  in  the  service  1-24  months, 
25-48  months,  49-96  months  and  more  than  96  months.     Combining  the  modules 
defitied  earlier  with  these  four  descriptions,  a  report  is  produced  that 
addresses  the  user's  question,     (Appendix  F)     Another  approach  might  use 
male/female  categories  in  place  of  time-in-service. 

Example  3;    Associating  Tasks  with  Training  Standards,     The  Air  Force  has 
established  a  document  for  every  AFSC  called  the  Specialty  Training  Standard 
(STS) ,     Supervisors  in  the  field  are  familiar  with  this  form  and  when 
presenting  data  to  these  personnel,  it  should  be  organized  accordingly. 
(Appendix  G)     Again,  a  subject  matter  specialist  is  utilized  in  associating 
the  STS  document  with  the  Task  Inventory, 
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Example  4;    Computer  Generated  Modules.    If  the  requirements  from  the 
definition  step  are  quantifiable,  then  the  assignment  of  tasks  to  modules 
can  be  computer  generated.     For  example,  the  assignment  could  be  based 
on  the  probability  of  co-performance  from  a  matrix  containing  the  probabil- 
ities that  tasks  are  performed  together.    Using  this  matrix,  tasks  which 
are  likely  to  be  performed  together  cluster  into  groups  called  task 
modules.     (Appendix  H)    The  next  step  is  for  a  subject  matter  specialist 
to  study  those  task  modules  and  label  each  as  a  separate  group  such  as 
training  module,  etc. 

Example  5;    Relating  Tools  to  Tasks  Performed.    Suppose  we  wish  to  look 
at  the  association  of  tools  and  equipment  with  tasks  performed.  A 
difference  description  is  produced  by  comparing  job  descriptions  of  those 
people  who  do  and  don't  use  a  selected  piece  of  equipment.    This  identifies 
those  tasks  which  are  likely  to  be  related  to  the  use  of  the  tool  and  they 
become  members  of  the  new  module.    Analysis  reports  are  then  generated  by 
merging  several  tool  module  descriptions  by  case  membership  groups. 

SUMMARY 

The  purpose  of  any  computerized  approach  to  problem  solving  is  to  provide 
the  information  necessary  for  making  decisions.    Computer  programs  have 
been  developed  to  produce  these  decision  making  reports  and  the  programs 
take  into  consideration  that  the  questions  asked  about  the  data  will 
differ  by  application..    These  programs  are  now  an  integral  part  of  the 
Comprehensive  Occupational  Data  Analysis  Programs  (CODAP)  system  at  the 
Air  Force  Human  Resources  Laboratory.     In  conclusion,  through  the  use 
of  user  defined  modules  we  have  realized  a  more  effective  utilization  of 
existing  data. 
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Appendix  A. 


Job  Inventory  for  Vehicle  Maintenance 


JOB  INVENTORY  IN  ALPHABETICAL  SEQUENCE.  ONLY  42  OUT  OF  690  POSSIBLE 
TASKS  ARE  SHOWN. 

1.  ADJUST  VALVE  CLEARANCE 

2.  ADMINISTER  OR  SCORE  TESTS 

3.  ALIGN  OR  ADJUST  HEADLIGHTS 

4.  ANALYZE  CAUSE  OF  BRAKE  FAILURE 

5.  ANALYZE  CAUSE  OF  VEHICLE  FAILURE 

6.  ANALYZE  MAINTENANCE  TRENDS 

7.  CHANGE  ENGINE  OIL 

8.  CHECK  OR  SERVICE  OIL  LEVELS 

9.  CLEAN  BATTERY  POSTS 

10.  CONDUCT  CLASSROOM  TRAINING 

11.  CONDUCT  OR  ATTEND  STAFF  MEETINGS 

12.  COORDINATE  WITH  SUPPLIERS  TO  MAINTAIN  REQUIRED  PARTS 

13.  DEMONSTRATE  OPERATION  OF  EQUIPMENT 

14.  DISASSEMBLE  DISTRIBUTORS 

15.  DRAIN  COOLING  SYSTEMS 

16.  DRAFT  CORRESPONDENCE 

17.  ESTABLISH  MAINTENANCE  PROCEDURES 

18.  FLUSH  TRANSMISSIONS 

19.  INSPECT  BRAKES 

20.  INSPECT  ENGINE  VALVE  GUIDES 

21.  INSPECT  FRONT  END  ALIGNMENT 

22.  INSPECT  IGNITION  POINTS 

23.  INSPECT  MOTOR  MOUNTINGS 

24.  INSPECT  MAINTENANCE  RECORDS 

25.  INSPECT  TIRES 

26.  INSPECT  VALVE  COVER  GASKETS 

27.  INSTALL  BRAKE  LININGS 

28.  INSTALL  CYLINDER  LINERS 

29.  INSTALL  ENGINES 

30.  INSTALL  POINTS 

31.  INSTALL  TRAILER  HITCHES 

32.  ISSUE  OR  MAINTAIN  STOCK  ITEMS  OF  HIGH  VALUE 

33.  ISSUE  PARTS  FROM  STOCK  ROOM 

34.  MAINTAIN  ACCIDENT  LOG 

35.  MAINTAIN  INVENTORY  FORM  226 

36.  MAINTAIN  VEHICLE  MAINTENANCE  FORM  100 

37.  MANUFACTURE  EIJGINE  GASKETS 

38.  OPERATE  ELECTRONIC  TEST  EQUIPMENT 

39.  OPERATE  TIRE  BALANCING  EQUIPMENT 

40.  PLAN  AIDS  FOR  TRAINING 

41.  PREPARE  ACCIDENT  REPORT  FORM  22 

42.  PREPARE  BRIEFINGS 
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Appendix  B. 


Job  Inventory  for  Vehicle  Maintenance 


JOB  INVENTORY  IS  CATEGORIZED  BY  DUTIES  WITH  THE  APPLICABLE 
TASKS  SHOWN  IN  ALPHABETICAL  SEQUENCE.     SOME  DUTIES  AND  TASKS 
ARE  NOT  SHOWN. 

A.  ORGANIZING,  PLANNING  AND  MANAGING 

6.  ANALYZE  MAINTENANCE  TRENDS 

11.  CONDUCT  OR  ATTEND  STAFF  MEETINGS 

16.  DRAFT  CORRESPONDENCE 

17.  ESTABLISH  MAINTENANCE  PROCEDURES 
24.     INSPECT  MAINTENANCE  RECORDS 

42.     PREPARE  BRIEFINGS 

B.  TRAINING 

2.  ADMINISTER  OR  SCORE  TESTS 

10.  CONDUCT  CLASSROOM  TRAINING 

13.  DEMONSTRATE  OPERATION  OF  EQUIPMENT 

40.  PLAN  AIDS  FOR  TRAINING 

44.  PREPARE  LESSON  PLANS 

53.  SELECT  INDIVIDUALS  TO  ATTEND  TRAINING 

C.  WORKING  WITH  FOP^S 

35.  MAINTAIN  INVENTORY  FORM  226 

36.  MAINTAIN  VEHICLE  MAINTENANCE  FORM  100 

41.  PREPARE  ACCIDENT  REPORT  FORM  22 

45.  PREPARE  SURPLUS  INVENTORY  FORM  695-7 

D.  PERFORMING  SUPPLY  FUNCTIONS 

12.  COORDINATE  WITH  SUPPLIERS  TO  MAINTAIN  REQUIRED  PARTS 

32.  ISSUE  OR  MAINTAIN  STOCK  ITEMS  OF  HIGH  VALUE 

33.  ISSUE  PARTS  FROM  STOCK  ROOM 

51.  RESEARCH  FEDERAL  STOCK  NUMBERS  OR  PART  NUMBERS 

54.  STOCK  PARTS,  SUPPLIES  OR  EQUIPMENT 

E.  TROUBLESHOOTING  VEHICLES 

4.  ANALYZE  CAUSE  OF  ENGINE  FAILURE 

5.  ANALYZE  CAUSE  OF  BRAKE  FAILURE 
20.     INSPECT  ENGINE  VALVE  GUIDES 

22.  INSPECT  IGNITION  POINTS 

23.  INSPECT  MOTOR  MOUNTINGS 

52.  ROAD  TEST  VEHICLES 

F.  REMOVING,  REPLACING  OR  CLEANING  PARTS 

7.  CHANGE  ENGINE  OIL 
9.     CLEAN  BATTERY  POSTS 

15.  DRAIN  COOLING  SYSTEMS 

19.  INSTALL  BRAKE  LININGS 

28.  INSTALL  CYLINDER  LININGS 

30.'  INSTALL  POINTS 

48.  REMOVE  OR  REPLACE  PISTONS  AND  RINGS 
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Appendix  C. 


Job  Iti-yentory  for  Vehicle  Maintenance 


JOB  INVENTORY  IS  IN  ALPHISBETICAL  ORDER  WITH  ASTERISKS  PLACED  BY  THOSE 

TASKS  IDENTIFIED  BY  A  SUBJECT  MATTER  SPECIALIST  AS  PART  OF  OVERHAULING 

AN  ENGINE. 

*1.  ADJUST  VALVE  CLEARANCE 

2.  ADMINISTER  OR  SCORE  TESTS 

3.  ALIGN  OR  ADJUST  HEADLIGHTS 

4.  ANALYZE  CAUSE  OF  BRAKE  FAILURE 

5.  ANALYZE  CAUSE  OF  VEHICLE  FAILURE 

6.  ANALYZE  MAINTENANCE  TRENDS 

7.  CHANGE  ENGINE  OIL 

*8.  CHECK  OR  SERVICE  OIL  LEVELS 

9.  CLEAN  BATTERY  POSTS 

10.  CONDUCT  CLASSROOM  TRAINING 

11.  CONDUCT  OR  ATTEND  STAFF  MEETINGS 

12.  COORDINATE  WITH  SUPPLIERS  TO  MAINTAIN  REQUIRED  PARTS 

13.  DEMONSTRATE  OPERATION  OF  EQUIPMENT 
*14.  DISASSEMBLE  DISTRIBUTORS 

*15.  DRAIN  COOLING  SYSTEMS 

16.  DRAFT  CORRESPONDENCE 

17.  ESTABLISH  MAINTENANCE  PROCEDURES 

18.  FLUSH  TRANSMISSIONS 

19.  INSPECT  BRAKES 

*20.  INSPECT  ENGINE  VALVE  GUIDES 

21.  INSPECT  FRONT  END  ALIGNMENT 

5^2.  INSPECT  IGNITION  POINTS 

23.  INSPECT  MOTOR  MOUNTINGS 

24.  INSPECT  MAINTENANCE  RECORDS 

25.  INSPECT  TIRES 

*26.  INSPECT  VALVE  COVER  GASKETS 

27.  INSTALL  BRAKE  LININGS 

*28.  INSTALL  CYLINDER  LINERS 

*29.  INSTALL  ENGINES 

30.  INSTALL  POINTS 

31.  INSTALL  TRAILER  HITCHES 

32.  ISSUE  OR  MAINTAIN  STOCK  ITEMS  OF  HIGH  VALUE 

33.  ISSUE  PARTS  FROM  STOCK  ROOM 

34.  MAINTAIN  ACCIDENT  LOG 

35.  MAINTAIN  INVENTORY  FORM  226 

36.  MAINTAIN  VEHICLE  MAINTENANCE  FORM  100 

37.  MANUFACTURE  ENGINE  GASKETS 

*38.  OPESA^  ELECTRONIC  TEST  EQUIPMENT 

39.  OVERATE  TIRE  BALANCING  EQUIPMENT 

40.  PLAIJ  AIDS  FOR  TRAINING 

41.  PREBARE  ACCIDENT  REPORT  FORM  22 

42.  PREPARE  BRIEFINGS 
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Appendix  D. 


"OVERHAULING  ENGINE"  Module 


THIS  MODULE  SHOWS  THOSE  TASKS  IDENTIFIED  AS  APPLICABLE  TO  OVERHAULING 
ENGINES.     SOME  TASKS  NOT  SHOWN. 


PERCENT  MEMBERS 
PERFORMING 


1ST 
TERM 


2ND 
TERM 


A. 


OVERHAULING  ENGINES 


1.  ADJUST  VALVE  CLEARANCE 

8.  CHECK  OR  SERVICE  OIL  LEVELS 

14.  DISASSEMBLE  DISTRIBUTORS 

15,  DRAIN  COOLING  SYSTEMS 

20.  INSPECT  ENGINE  VALVE  GUIDES 

22.  INSPECT  IGNITION  POINTS 

26.  INSPECT  ENGINE  COVER  GASKETS 

28-  INSTALL  CYLINDER  LINERS 

29.  INSTALL  ENGINE 

38,  OPERATE  ELECTRONIC  TEST  EQUIPMENT 

46,  PREPARE  VEHICLE  MAINTENANCE  FORM  100 

48.  REMOVE  OR  REPLACE  PISTONS  OR  RINGS 

49.  REMOVE  OR  REPLACE  POINTS 


2.0 

32.1 

48.7 

4.6 

4.7 

62.7 

36.2 

10.5 

1.1 

38.7 

4.8 

56.3 

25.9 

19.2 

0.5 

66.9 

30.3 

26.4 

1.1 

43.6 

57.4 

10.1 

5.3 

26.7 

2.4 

47.8 
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Appendix  E. 


Five  Categorical  Modules 


THESE  MODULES  SHOW  WHICH  TASKS  WERE  IDENTIFIED  AS  BELONGING  TO 
THE  SPECIFIED  CATEGORY.     SOME  TASKS  NOT  SHOWN- 


A.  MUlAGEiaAL 


6.  ANALYZE  MAINTENANCE  TRENDS 

11.  CONDUCT  OR  ATTEND  STAFF  MEETINGS 

16.  DRAFT  CORRESPONDENCE 

24.  INSPECT  MAINTENANCE  RECORDS 

42.  PREPARE  BRIEFINGS 


B.  CLERICAL 


2.  ADMINISTER  OR  SCORE  TESTS 

16.  DRAFT  C0RRESP0ND?:NCE 

34.  MAINTAIN  ACCIDENT  LOG 

35.  MAINTAIN  INVENTORY  FORM  226 

46.  MAINTAIN  VEHICLE  MAINTENANCE  FORM  100 


C.     HEAVY  TASKS 


29.  INSTALL  ENGINES 

33.  ISSUE  PARTS  FROM  STOCK  ROOM 

47.  REMOVE  OR  REPLACE  BATTERIES 

50.  REMOVE  OR  REPLACE  POWER  STEERING  UNITS 

55.  ROTATE  TIRES 


D.     DIRTY  TASKS 


7.  CHANGE  ENGINE  OIL 

9.  CLEAN  BATTERY  POSTS 

15.  DRAIN  COOLING  SYSTEMS 

29.  INSTALL  ENGINES 

47.  REMOVE  OR  REPLACE  BATTERIES 


E.     LIGHT  TASKS 


1.  ADJUST  VALVE  CLEARANCES 

3.  ALIGN  OR  ADJUST  HEADLIGHTS 

7.  CHANGE  ENGINE  OIL 

9.  CLEAN  BATTERY  POSTS 

49.  REMOVE  OR  REPLACE  POINTS 
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Appendix  F. 


Percent  Members  Performing  Categorical  Modules 


THE  MODULES,  WITH  RELATED  TASKS,  SHOW  WHICH  CATEGORY  OF  PEOPLE  ARE 
PERFORMING  WHAT  TYPE  OF  TASK.     PERCENT  MEMBERS  PERFORMING  DATA  IS 
USED. 


A.  MANGERIAL 


6.  ANALYZE  MAINTENANCE  TRENDS 

11.  CONDUCT  OR  ATTEND  STAFF  MEETINGS 

16.  DRAFT  CORRESPONDENCE 

24.  INSPECT  MAINTENANCE  RECORDS 

42.  PREPARE  BRIEFINGS 


1 

25 

49 

-24 

-96 

96+ 

0.0 

3.1 

25.2 

1.: 

7 

10.7 

89.3 

2 

36.7 

5.5 

1.6 

5.6 

42.3 

10.5 

0.0 

0.0 

15.6 

75.6 

CLERICAL 


2.  ADMINISTER  OR  SCORE  TESTS 

16.  DRAFT  CORRESPONDENCE 

34.  MAINTAIN  ACCIDENT  LOG 

35.  MAINTAIN  INVENTORY  FORM  226 

46.  MAINTAIN  VEHICLE  MAINTENANCE  FORM  100 


HEAVY  TASKS 


29.  INSTALL  ENGINES 

33.  ISSUE  PARTS  FROM  STOCK  ROOM 

47.  REMOVE  OR  REPLACE  BATTERIES 

50.  REMOVE  OR  REPLACE  POWER  STEERING  UNITS 

55.  ROTATE  TIRES 


D.     DIRTY  TASKS 


7.  CHANGE  ENGINE  OIL 

9.  CLEAN  BATTERY  POSTS 

15.  DEAIN  COOLING  SYSTEMS 

29.  INSTALL  ENGINES 

47.  REMOVE  OR  REPLACE  BATTERIES 


E.     LIGHT  TASKS 


1. 
3. 
7. 
9. 
49. 


ADJUST  VALVE  CLEARANCES 
ALIGN  OR  ADJUST  HEADLIGHTS 
CHANGE  ENGINE  OIL 
CLEAN  BATTERY  POSTS 
REMOVE  OR  REPLACE  POINTS 


5.6 

32.1 

10.5 

1.1 

4.3 

10.2 

36.7 

.  5.5 

4.7 

50.1 

48.6 

J .  9 

16.3 

42.8 

10. 2 

0.5 

10.  2 

66.6 

12.7 

0.1 

26.5 

30.3 

8.8 

0.9 

10.9 

26.3 

25.5 

1.3 

63.7 

10.2 

1.6 

0.0 

15.1 

26.9 

9.9 

2.6 

72.6 

21.0 

4.1 

0.0 

66.7 

4.1 

0.0 

54.3 

5.0 

1.1 

0.0 

51.6 

4.7 

2.6 

0.1 

25.6 

30.3 

8.8 

0.9 

63.7 

10.2 

1.6 

0.0 

0.0 

5.1 

32.6 

15.5 

20.1 

22.6 

5.4 

1.0 

66.7 

30.9 
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Appendix  G. 


Specialty  Training  Standard 


TASKS  ARE  ASSOCIATED  WITH  THE  SPECIALTY  TRAINING  STANDARD  FOR  THE 
VEHICLE  MAINTENANCE  PERSONNEL. 


lA    DISASTER  PREPAREDNESS  &  EMERGENCY  PROCEDURES 


151.  ATTEND  SAFETY  BRIEFINGS 

102.  MAINTAIN  FIRE  EXTINGUISHER  READINESS  FORM  672 

133.  PERFORM  SPOT  CHECKS  OF  SAFETY  READINESS 

26A.  PRACTICE  EMERGENCY  PROCEDURES 


IIB  SECURITY 


32.  ISSUE  OR  MAINTAIN  STOCK  ITEMS  OF  HIGH  VALUE 

196.  MAINTAIN  STOCK  INVENTORY 

599.  PLAN  SECURITY  PROGRAMS 

602.  CONDUCT  SECURITY  BRIEFINGS 


IVA     SUPERVISING  AND  TRAINING 


2.  ADMINISTER  OR  SCORE  TESTS 

10.  CONDUCT  CLASSROOM  TRAINING 

220.  SCHEDULE  WORK  ASSIGNMENTS 

319.  SUPERVISE  SUBORDINATES 
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Appendix  H.  Computer  Generated  Module 


THESE  MODULES  WERE  GROUPED  TOGETHER  BASED  ON  THEIR  PROBABILITY  OF 
BEING  PERFORMED  TOGETHER. 


A.     MINOR  ENGINE  OR  TRANSMISSION  SERVICING 


7.  CHANGE  ENGINE  OIL 

8.  CHECK  OR  SERVICE  OIL  LF.VELS 

9.  CLEAN  BATTERY  POSTS 
15,  DRAIN  COOLING  SYSTEMS 
18,  FLUSH  TRANSMISSIONS 


B.     SERVICING  ELECTRICAL  SYSTEMS 


14.  DISASSEMBLE  DISTPJBUTORS 

22.  INSPECT  IGNITION  POINTS 

38.  OPERATE  ELECTRONIC  TEST  EQUIPMENT 

49.  REMOVE  OR  REPLACE  POINTS 


C      CLASSROOM  TRAINING 


2.  ADMINISTER  OR  SCORE  TESTS 

10.  CONDUCT  CLASSROOM  TRAINING 

13.  DEMONSTRATE  OPERATION  OF  EQUIPMENT 

156.  OPERATE  AUDIOVISUAL  EQUIPMENT 

170.  PLAN  AIDS  FOR  TRAINING 

43.  PREPARE  LESSON  PLAINS 

189.  SIGN  OFF  TRAINING  RECORDS 
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ABSTRACT 


The  Command  and  General  Staff  College  (CGSC)  prepares  Army 
officers  for  duty  as  field  grade  commanders  and  principal  staff 
officers  at  brigade  and  higher  echelons.     The  College  consumes 
significant  expenditures  and  provides  the  first,  and  for  the 
majority  of  field  grade  officers  the  only  formal  Army  training  for 
high  level  jobs.    Despite  the  importance  of  the  CGSC  mission, 
occupational  definition  of  post-CGSC  assignments  and  the  crosswalks 
to  training  needs  analysis  at  this  level  of  responsibility  have  not 
yet  been  objectively  addressed.     In  a  memorandum  to  the  Army  Research 
Institute  (ARI)  in  1977  the  CGSC  Commandant  stated,  "front-end 
analysis  to  support  curriculum  development  ...  is  one  of  the  most 
pressing  priorities  that  the  College  faces  today."    He  requested 
that  ARI  research  the  feasibility  of  using  the  ARI  Duty  Module 
concept  "to  provide  an  information  base  for  decision  on  further 
research  effort  and  its  direction." 

This  research  was  directed  to  the  examination  of  two  disparate 

sub-courses  of  the  CGSC  curriculum.    Research  design,  results  from 

the  feasibility  prototype,  and  directions  for  further  research  are 
discussed. 
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BACKGROUND 


ARMY  RESEARCH  INSTITUTE  INVOLVEMENT  WITH  COMMAND  AND  GENERAL  STAFF 
COLLEGE  CURRICULUM  DEVELOPMENT 

In  a  memorandum  to  the  Army  Research  Institute  (ARI)  in  1977  the 
Command  and  General  Staff  College  (CGSC)  Commandant  stated,  "front-end 
analysis  to  support  curriculum  development  ...  is  one  of  the  most  pressing 
priorities  that  the  College  faces  today."    He  requested  that  ARI  research 
the  feasibility  of  using  the  ARI  Duty  Module  methodology  "to  provide  an 
information  base  for  decision  on  further  research  effort  and  its  direction.' 
The  feasibility  research  has  been  completed.    ARI  is  currently  working 
in  both  the  Analysis  and  Control  (external  evaluation  or  feedback) 
phases  of  the  Instructional  Systems  Development  (ISD)  of  CGSC  curriculum 
development.     The  ongoing  research  was  precipitated  by  the  Duty  Module 
feasibility  results,  statements  of  Human  Resource  Needs  (HRNs)  for  new 
methods  of  front-end  analysis  for  non-procedural  tasks  from  several  Army 
schools  and  HRNs  for  feedback  on  training  and  education^  from  CGSC 
graduates. 

CGSC  MISSION  REQUIRES  BOTH  TRAINING  AND  EDUCATION 

The  mission  of  the  Command  and  General  Staff  College^  is  to  provide 
instruction  for  officers  of  the  Active  Army  and  Reserve  components, 
worldwide,  so  as  to  prepare  them  for  duty  as  field  grade  commanders  and 
principal  staff  officers  at  brigade  and  higher  echelons. 

The  College  prepares  officers  to: 

—  Command  battalions,  brigades,  and  equivalent-sized  units 
in  peace  or  war. 

—  Train  these  units  to  accomplish  their  assigned  missions. 

—  Employ  and  sustain  weapon  systems  to  optimize  their  effect 
in  the  conduct  of  combined  arms  operations. 

—  Serve  as  principal  staff  officers  from  brigade  through 
division,  to  include  support  commands,  and  as  staff  officers 
of  higher  echelons,  including  major  Army,  joint,  unified, 

or  combined  headquarters. 


EKLC 


■'■  The  definitions  of  training  and  education  for  this  paper  are: 
Training  -  Teaching  specific  skills  which  will  be  needed  in  the  next 
assignment.     Education  -  Teaching  broad  knowledge  areas  as  a  founda- 
tion for  the  requirements  of  all  expected  positions  in  the  future, 
not  necessarily  for  the  next  assignment. 

^  1977-78  Catalogue,  US  Army  Command  and  General  Staff  College 
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CGSC  offers  a  Master's  degree  in  Military  Arts  and  Sciences,  and  offers 
the  opportunity  to  obtain  numerous  other  Master's  level  degrees  from  a 
number  of  other  colleges  and  universities.    Although  the  junior  officer 
schools  (Basic  Course  for  second  lieutenants  and  Advanced  Course  for 
captains)     teach  some  basic  management  and  supervisory  skills,  the  major 
emphasis  is  on  specialty-related  tasks  and  separate  schools  are  run  by 
the  specialty    branches — the  graduate  of  a  Basic  or  Advanced  School  is 
expected  to  be  technically  proficient  in  specialty  skills. 

In  a  survey  of  general  officers  concerning  the  Army  officer  educa- 
tion and  training  programs  (Van  Nostrand  and  Wallis,  1978)  attitudes 
were  identified  as  follows: 

—  Management  should  be  taught  (at  CGSC)  but  not  at  the 
Basic  and  Advanced  Courses  where  officers  are  taught  to  be 
technicians  in  their  branch  specialties. 

—  CGSC  should  teach  those  brilliant  young  officers  who  are  to 
provide  the  staff  and  general  officers  who  will  run  the  Army 
for  the  next  10  to  20  years.     (Approximately  6-7  years  after 
attending  CGSC  the  officers  are  competitively  selected  to 
attend  the  Army  War  College) 

—  Xhe  Army  should  go  to  the  university  concept. 

—  What  should  be  taught: 
Conceptualization,  even  though  difficult 
Develop  truly  general  staff  officers 

Research,  write  and  brief  on  solutions  to  real  issues 
ISD  USED  FOR  CURRICULUM  DEVELOPMENT 

All  of  the  US  Army  schools  for  officers,  except  the  US  Military 
Academy  and  the  Army  War  College,  are  monitored  by  the  US  Army  Training 
and  Doctrine    Command  (TRADOC) .     Curriculum  development  within  TRADOC 
doctrine  requires  that  the  TRADOC  monitored  schools  use  the  Instructional 
Systems  Development  (ISD)  process  as  a  systems  approach  to  the  develop- 
ment and  evaluation  of  training  (TRADOC  Pamphlet  350-30).    Although  ARI 
research  is  concerned  with  all  five  phases  (Analyze,  Design,  Develop, 
Implement  and  Control)  of  the  ISD  model,  this  paper  is  directed  to  those 
phases  which  require  occupational  analysis  to  provide  decision-making 
data.    These  are: 
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Analyze  -  (a)    Determine  tasks  to  be  taught  (front-end  analysis)  ; 

(b)     Determine  setting  in  which  each  task  will  be  taught 

Control  •  (a)     Internal  evaluation  ~  how  well  did  students  meet 
the  stated  objectives? 

(b)    External  evaluation  ~  how  well  do  graduates  perform 
on  the  job?    Usually  determined  by  performance 
evaluations  of  the  school-taught  tasks  with  feedback 
information  to  the  schools. 

Procedures  evolved  through  the  use  of  ISD  in  the  Army  schools  have 
proved  useful;  they  represent  many  person-years  of  effort  to  develop  a 
workable,  systematized  training  approach.    Some  of  these  procedures  are: 

a.  Occupational  description  techniques  developed  to  define  a 
position  in  terms  of  tasks  having  specific  beginning  and  ending  times, 
cue  to  perform,  and  step-by-step  (or  procedural)  description  of  how  the 
task  is  to  be  performed.     These  techniques  have  proved  useful  for  the 
majority  of  enlisted  tasks  and  for  many  of  the  NCO  and  company  grade 
officer  specialty-unique  tasks.     The  majority  of  the  Army  schools  respon- 
sible for  training  for  these  jobs  need  concern  themselves  with  only 
those  jobs  which  are  unique  to  their  specialties. 

b.  The  crosswalk  from  occupational  analysis  to  training  require- 
ments has  been  successfully  addressed  for  enlisted  personnel.  However, 
the  problem  of  training  requirements  of  supervisors  and  managers  at  the 
non-commissioned  officer  (NCO)  level  based  on  job  descriptions  has  not 
yet  been  resolved.     This  problem  has  already  surfaced  for  company  grade 
officers  in  the  recently  initiated  TRADOC  program  for  defining  officer 
tasks. 

c.  Criticality  has  been  refined  to  four  measures  commonly  called 
"the  four-factor  model."    However,  this  refinement  is  inadequate  to 
answer  all  criticality  questions. 

d.  A  concept  that  permeates  all  descriptions  of  ISD  is,  "train  for 
the  next  job  to  be  performed"  i.e.  if  the  trainee  will  not  use  the  skill 
very"l^n  there  may  be  no  reason  for  training  it  ~  the  learning  retention 
decay  rate  may  prove  the  training  resources  could  better  be  allocated 
elsewhere. 

The  ISD  process  is  proving  to  be  very  difficult  to  implement  at  the 
Command  and  General  Staff  College.     The  standards  or  concepts  noted 
above,  although  not  necessarily  "standard"  in  the  original  ISD  reports 
(Branson,  et  al,  1975)  are  particularly  difficult  to  apply  to  the 
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curriculum  for  field  grade  officers.     The  CGSC  curriculum  which  does  not 
focus  on  specialty  proficiency,  but  rather  a  general  broadening  of 
horizons  for  field  grade  officers,  cannot  be  fitted  to  the  conventional 
front-end  analysis  techniques  of  the  ISD  process. 

First,  as  CGSC  serves  the  entire  Army,  not  just  a  few  specialities 
the  sheer  size  of  the  data  base  is  a  problem — all  field  grade  officer 
positions  in  the  Army  must  be  subjected  to  occupational  analysis  for 
creation  of  the  task  lists.    Using  the  assumption  (as  is  usually  now  the 
case)  that  a  supervisor's  job  must  include  generalized  management  tasks 
plus  a  knowledge  of  the  tasks  of  all  the  supervised  personnel,  the  size 
of  the  data  base  is  multiplied  by  some  unknown  factor. 

Second,  a  unit  of  instruction  usually  teaches  several  related 
tasks.  As  the  data  base  becomes  larger  it  becomes  more  and  more  difficult 
to  find  all  of  the  related  tasks.    Unfortunately,  the  task  analysis 
techniques  do  not  yield  tasks  which  fit  clustering  requirements  for  CGSC 
curriculum  development. 

Next,  CGSC  is  a  masters  level  degree  granting  institution,  and  is 
in  this  respect,  unique  among  the  TRADOC  schools.     The  concept  of  CGSC 
as  an  institute  of  higher  learning,  providing  the  foundation  for  future, 
individual  officer  self-development  and  growth  (to  "think  and  decide") 
requires  that  subjects  be  taught  which  are  not  based  on  "next  assignment, 
but  are  general  education  in  many  different  fields. 

Further,  as  CGSC  is  the  formal  training/education  institution  for 
the  Army  "middle  managers,"  many  of  the  tasks  for  which  CGSC  does  train 
are  non-procedural  in  nature,  i.e.,  these  tasks  are  difficult,  perhaps 
impossible  to  define  in  terms  of  cue  to  perform,  begin  and  end  points, 
steps  to  perform,  and  evaluation  criteria. 

Even  more  difficult  is  the  choosing  of  criteria  on  which  to  base 
the  train/don't  train  decision.     The  four  factor  criteria  used  for 
enlisted  and  branch  specific  tasks  do  not  apply.     A  concept  that  has 
been  popular  recently  is,  "the  officer  is  much  more  than  the  sum  of 
those  skills  in  which  proficiency  can  be  demonstrated."    Consider  the 
following  hypothetical  example:     If  most  field  grade  officers  spend  50% 
of  theirtime  reading  paper  work  of  some  type  and  less  than  1%  of  the 
time  making  decisions;  should  CGSC  train  them  to  read  paper  work,  or 
should  more  resources  be  spent  in  teaching  good  decision-making? 

PREVIOUS  FIELD  GRADE  OFFICER  OCCUPATIONAL  ANALYSIS  RESEARCH  BY  ARI 

Responding  to  personnel  management  needs  ARI  has  been  working  on 
the  Duty  Module  concept  since  1970.     A  Duty  Module  represents  a  signifi- 
cant work  activity;  is  applicable  to  a  number  of  different  duty  positions 
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and  describes  the  various  jobs  in  a  common  language.    A  Duty  Module  is 
smaller  than  an  MOS  or  any  one  job  within  an  MOS  and  larger  than  a  task. 
It  is  actually  a  cluster  of  10  to  20  tasks  that  relate,  occupationally 
and  organizationally,  in  meaningful  ways.    These  tasks  are  very  much 
like  the  tasks  produced  by  other  job  analysis  techniques,  but  the 
significant  difference  is  that  a  major  emphasis  of  the  original  research 
was  to  produce  meaningful  task  clusters.     These  horizontal  clusters  can 
be  used  as  building  blocks,  or  "plug-in"  units,  to  describe  the  signifi- 
cant duties  of  any  job  using  only  a  few  Duty  Modules.     Duty  Modules  are 
also  designed  for  describing  jobs  at  all  levels  of  responsibility  (vertically 
clustered).     Therefore,  the  full  interrelationship  among  jobs,  across 
all  specialties  and  for  all  officer  grades,  both  similarities  (commonality) 
and  differences,  can  be  codified. 

Although  the  Duty  Module  methodology  could  be  applied  to  civilian 
organizations,  or  to  enlisted  or  NCO  positions,  the  research  was  directed 
to  support  of  the  Officer  Personnel  Management  System  (OP^'S)  and  the 
present  data  base  is  essentially  complete  for  officer  dut   as  common  to 
all  the  OPMS  specialties.    Further  development  to  complete  the  OPMS  data 
base  would  necessitate  creation  of  only  a  limited  number  of  specialty- 
specific  Duty  Modules. 

APPLICATION  OF  DUTY  MODULE  TECHNIQUE  TO  FRONT-END  ANALYSIS 

Most  CGSC  graduates  will  be  assigned  as  a  staff  officer,  some  at 
very  high  levels,  others  may  assume  command  of  a  battalion  or  brigade. 
The  commander's  management  role  is  analagous  to  that  of  the  operations 
manager  of  a  medium-sized  manufacturing  company.    Additional  duties  of 
the  position  require  responsibility  for  the  unit  as  it  trains  to  achieve 
and  maintain  combat  readiness  during  peacetime,  with  the  capability  for 
rapid  transition  to  combat  effectiveness  during  war.     The  resources 
available  to,  and,  therefore,  controlled  by,  one  Armor  battalion  commander 
consists  of  approximately  550  personnel,  $55  million  investment  in 
equipment,  and  annual  expenditures  of  $13  million.     The  staff  role  of 
the  CGSC  graduate  can  have  comparable  responsibility.     In  context  of  the 
increasingly  constrained  training  resources,  the  growing  importance  of 
training  quality  can  not  be  over stressed.    As  the  quality  of  training  is 
dependent  upon  the  adequacy  of  the  front-end  analysis,  those  responsible 
for  CGSC  curriculum  development  have  a  continuing  concern  with  development 
of  better  front-end  techniques.     In  keeping  with  this  concern,  the  CGSC 
has  used  both  formal  and  informal  channels  to  obtain  feedback  on  the 
appropriateness  and  utility  of  the  instruction.     This  concern  has  stimulated 
many  students  to  study  some  aspects  of  curriculum  development  as  part  of 
their  independent  research  requirement. 

A  recently  completed  CGSC  Master's  thesis  (Norris  and  Robbins, 
1977)  explored  the  feasibility  of  utilizing  Duty  Modules  for  the  front- 
end  analysis  of  the  CGSC  regular  course.     The  thesis  is  based  in  part 
upon  earlier  ARI  work,  Cory,  Medland,  and  Uhlaner  (1977);  Davis  and 
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Korotkin  (1975);  Korotkin,  et.al.  (1975);  and  others.    This  thesis 
develops  the  concept  of  Duty  Modules  as  the  vehicle  for  the  ISD  Analysis 
phases  of  CGSC  curriculum  development. 

Although  Norris  and  Robbins  point  out  some  possible  shortcomings  of 
the  Duty  Module  approach,  they  nontheless  conclude  that  theoretically, 
"....Duty  Modules  offer  an  attractive  approach  to  this  problem  and 
have  the  major  advantage  of  being  beyond  the  'drawing  board  stage'. 
Duty  Modules  are  a  reality  and  the  effort  in  time  and  resources  to 
apply  these  concepts  to  the  college  is  far  less  than  that  required 
to  develop  new  methodology." 

The  need  for  empirical  validation  of  the  Norris  and  Robbins  approach 
stimulated  the  CGSC  Commandant's  request  that  ARI  conduct  the  prototype 
feasibility  research  which  was  initiated  during  the  fall  of  1977.  The 
design  of  the  prototype  analysis  was: 

a.  Identify  two  significant  assignments  filled  by  CGSC  graudates. 

b.  Identify  the  CGSC  courses  or  sub-courses  which  prepared  the 
officer  for  the  identified  assignments. 

c.  Describe  both  the  course  curriculum  and  the  assignments  using 
the  Duty  Module  structure. 

d.  Compare  each  assignment  Duty  Module  structure  with  the  Duty 
Module  structure  of  the  related  CGSC  course.     Commonality  will  be  indicative 
of  degree  of  correlation  between  training  and  job  requirements.  Significant 
commonality  would  indicate  a  high  degree  of  overlap  between  content 

taught  and  skills  required  on  the  job.    Lack  of  or  little  commonality 
would  indicate  one  or  more  of  the  following: 

1.  CGSC  is  teaching  material  not  required  or  necessary  to  the  job. 

2.  CGSC  is         teaching  skills  required  by  the  assignment. 

3.  The  Duty  Module  approach  is  not  feasible. 

Two  assignment  areas  were  selected  to  represent  disparate  duties 
and  relate  to  specific  instructional  areas: 

a.  Combat  commander;  related  course  is  "Battle  Captains" 

b.  Staff  assignments  at  Office  of  the  Secretary  of  Defense  (OSD) , 
Office  of  the  Joint  Chiefs  of  Staff  (OJCS) ,  Department  of  the  Army  (DA), 
and  Army  major  commands;  related  course  is  "High  Level  Staff  Applications." 
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The  Battle  Captains  course  is  one  course  of  a  sequence  of  five  ■ 
orientation  courses  given  as  refresher  training  to  command  designees  ^ 
(lieutenant  colonel  and  colonel)  prior  to  their  assumption  of  command. 
Each  of  these  five  courses  closely  matches  one  of  the  six  previously 
validated  Duty  Modules  which  apply  to  unit  commanders,  although  detailed 
analysis  was  not  performed  for  the  four  not  taught  at  CGSC.    The  subject 
matter  of  one  of  the  Duty  Modules,  "General  Administration,"  is  not 
taught  in  these  orientation  courses  and  it  must  be  assumed  that  the 
officer  retains  the  necessary  knowledge  and  skills  from  previous  education 
and  on-the-job  training. 

Comparison  of  the  detailed  task  analyses  of  the  Battle  Captains 
course  and  of  the  0-U-l  Duty  Module,  "Directs  and  controls  employment  of 
Infantry  and  Armor  maneuver  unit,"  shows  that  the  tasks  taught  and  the 
tasks  performed  correspond  exactly.    Using  the  same  technique  it  should 
be  possible  to  compare  the  other  four  orientation  courses  and,  if  necessary, 
to  develop  another  couse  for  the  general  administration  module.  For 
this  course  we  can  say  that  the  Duty  Module  front-end  analysis  procedure 
is  feasible. 

The  comparison  of  relevant  duty  modules  and  the  High  Level  Staff 
Application  Course  was  more  difficult.    To  adequately  describe  the 
position,  "Action  Officer,  High  Level,"  it  was  necessary  to  create  one 
new  Duty  Module,  "Performs  action  officer  functions  on  a  high  level 
staff."    Verification  of  this  new  Duty  Module  was  accomplished  by  interviewing 
a  sample  of  20  respondents  holding  high  level  staff  positions.    Although  i 
all  20  respondents  performed  the  new  module,  it  was  necessary  to  use  17  i 
Duty  Modules  from  the  data  base  to  adequately  describe  their  positions. 
It  is  unusual  to  need  as  many  as  18  for  20  similar  positions,  but  the 
job  incumbents  represented  8  different  branches,  11  primary  specialties 
and  12  alternate  specialties  (a  total  of  19  different  specialties) .  The 
18  Duty  Modules  performed  by  the  surveyed  incumbents  were  all,  except 
the  new  one,  specialty  related  and,  therefore,  would  not  be  of  concern 
to  CGSC;  they  would,  or  should  have  been  taught  at  the  specialty  related 
schools  and  earlier  attendance  at  CGSC.    These  modules  had  been  verified 
in  earlier  research  but  were,  hov;ever,  examined  to  assure  that  they  did 
continue  to  accurately  describe  the  duties. 

An  examination  of  the  program  of  instruction  (POX)  revealed  these 
five  subject  areas: 

a.  The  organization,  functions  and  relationships  between  OSD,  OJCS, 
Office  of  the  Secretary  of  the  iftrny  (OSA) ,  and  Office  of  the  Chief  of 
Staff  of  the  Army  (OCSA) . 

b.  The  organization,  functions  and  relationship  to  DA  of 

-  Headquarters,  TRADOC 

-  Headquarters,  DARCOM 

-Headquarters,  FORSCOM  - J  ^ 
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c.  The  organization,  functions  and  relationship  of  Headquarters, 
US  Readiness  Command  to  OJCS. 

d.  DA  staffing  procedures  to  include  rewriting  a  decision  memorandum 
into  175  words  or  less  and  writing  two  information  papers  of  175  words  or 
less. 

e.  Staff  techniques  and  procedures  used  within  the  OJCS.    Of  the 
five  subject  areas,  the  last  two  listed,  being  performance  oriented, 
lend  themselves  to  a  front-end  analysis  using  Duty  Module  techniques. 

The  first  three  subjects  are  informational  in  nature  and  cannot  be 
directly  translated  into  a  Duty  Module  structure. 


HIGH  LEVEL  STAFF 
CGSC  course:     HIGH  LEVEL  STAFF 


SUBJECT  AREAS: 

a.  Organization,  functions  and 
relationships  between  OSD,  OJCS, 
OSA  AND  OCSA. 

b.  Organization,  functions  and 
relationships  to  DA  of  TRADOC, 
DARCOM  and  FORSCOM, 

c.  Organization,  functions  and 
relationships  of  US  Readiness 
Command  to  OJCS. 

d.  DA  staffing  procedures  inc- 
cluding  writing  decision  memor- 
andum and  two  information  papers. 

e.  Staff  techniques  and  proced- 
ures used  within  the  OJCS. 


COMPARISON 

DUTY  module:    PERFORMS  ACTION 
OFFICER  FUNCTIONS  ON  A  HIGH  LEVEL 
STAFF 

TASKS: 

a.  Prepare  decision  memoranda, 
information  memoranda,  information 
papers,  and  other  similar  documents 
for  a  superior. 

b.  Represent  superior  in  action 
officer  meetings. 

c.  Process  joint  staff  action 
directives. 


FIGURE  1 
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Comparing  subject  area  d  and  e  from  the  curriculum  with  the  tasks 
in  the  new  Duty  Module,  one  can  see  a  close  correlation,  see  Figure  1, 
This  signifies  that  these  subject  areas  should  be  included  in  the  course 
curriculum.    This  type  of  comparison,  however,  does  not  lend  itself  to  a 
statistical  analysis  so  it  is  not  possible  to  state  a  confidence  level 
with  which  one  can  say  they  should  be  included,  or  what  percentage  of 
the  time  should  be  devoted  to  them,  especially  as  only  some,  not  all  of 
the  officers  use  the  OJCS  staffing  procedures. 

To  explore  the  applicability  of  the  methodology  to  the  first  three 
subject  areas  a  questionnaire  was  administered.     Respondents  were  asked 
to  indicate  the  degree  of  understanding,  ranging  from  "comprehensive"  to 
"no  understanding,"  which  they  needed  of  OSD,  OJCS,  DA  TRADOC,  FORSCOM, 
DARCOM,  US  Readiness  Command,  or  other  similar  headquarters  in  order  to 
perform  their  assigned  duties.    Not  surprisingly,  the  survey  sample 
coiiq)osed  of  DA  and  DARCOM  staff  officers  indicated  a  need  for  a  high 
level  of  understanding  of  the  organization  and  functions  of  their  own 
headquarters.     Next  followed  OSD,  TRADOC,  FORSCOM,  OJCS,  and  US  Readiness 
Command,  in  that  order.    One  can  deduce  that  the  "need  to  know"  rating 
of  any  headquarters  would  go  up  if  officers  from  that  headquarters  were 
included  in  the  survey  sample.    It  does  appear  significant,  however, 
that  the  US  Readiness  Command  received  lower  need  to  know  ratings  from 
the  survey  sample  than  did  write-ins  for  US  Army  Europe  (USAREUR) .  This 
outcome  suggests  that  consideration  be  given  to  examining  whether  Head- 
quarters, USAREUR  should  replace  US  Readiness  Command  in  the  PCX. 
Before  this  consideration,  however,  a  larger  survey  which  includes 
officers  from  all  of  the  .designated  offices  should  be  performed.     If  the 
result  still  holds  true  the  POI  decision  should  be  made  by  training 
experts;  there  may  be  valid  reasons  for  including  a  joint  headquarters 
in  the  curriculum  to  the  exclusion  of  a  major  overseas  command. 

When  courses  teach  performance-oriented  skills,  it  is  logical  that 
the  skills  should  appear  in  a  Duty  Module  for  some  Army  job,  as  Duty 
Modules  are  a  prior  performance-oriented.    When  courses  teach  information, 
that  information  will  not  appear  in  a  Duty  Module  directly,  but  only  in 
a  performance  task  which  is  influenced  by  the  information  acquired. 
This  is  easily  seen  by  the  results  of  the  two  comparisons.    The  skills 
taught  by  the  Battle  Captains  Course  are  performance-oriented;  the  Duty 
Module  approach  was  completely  successful  for  a  front-end  analysis  of 
this  course.     The  High-Level  Staff  Application. Course  teaches  some 
performance  skills  and  some  knowledges  (information);  the  Duty  Module 
approach  was  only  partially  successful  for  this  front-end  analysis. 

RESULTS 

The  results  of  the  feasibility  can  be  stated: 

a.    For  performance-oriented  skills  it  is  feasible  to  use  Duty 
Modules  to  fully  describe  all  positions  filled  by  graduates  of  CGSC, 
then  compare  the  applicable  Duty  Modules  with  the  Duty  Module  structures 
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of  the  scope  and  instructional  objectives  of  a  large  portion  of  the 
college  curriculum  to  "identify  curriculum  needs  and  define  CGSC  output, 
both  critical  elements  in  r2SOurce  justification"  (Norris  and  Robbins, 
Ibid). 

b.     The  Duty  Module  approach  is  not  adequate  for  those  courses  or 
sub-objectives  of  courses  which  are  designed  to  impart  knowledges  or 
information;  expanded  front-end  analysis  techniques  must  be  developed 
for  these. 

RESEARCH  IN  PROGRESS 

FRONT-END  ANALYSIS  FOR  NON-PROCEDURAL  TASKS 

Several  of  the  schools  in  the  TRADOC  community  have  identified 
needs  for  new  front-end  analysis  techniques  for  those  parts  of  the 
curriculum  which  are  difficult  to  describe  in  terms  of  performance- 
oriented  (procedural)  tasks  such  as  administrative,  communicatior  skills 
(interpersonal  as  well  as  reading,  writing  and  briefing),  and  leadership. 
This  first  effort  is  to  make  more  explicit  the  procedures  for  defining 
these  non-procedural  assignment  requirements  which  should  be  included  in 
education/training  programs  but  which  are  not  normally  described  in 
officer  job  descriptions.     Several  alternative  methods  for  representing 
these  additional  data  for  inclusion  in  job  analyses  are  being  considered. 
Some  of  these  are: 

a.  A  simple  task  list  prepared  in  CODAP-type  format  using  the 
scales  developed  for  enlisted  positions, 

b.  A  similar  CODAP-type  format,  but  using  scales  which  "expert 
opinion"  feels  should  be  used  for  officer  surveys, 

c.  A  questionnaire  at  the  level  of  topics  in  a  Program  of  Instruction 
rather  than  tasks  (this  alternative  will  also  examine  alternative  types 

of  responses  such  as  simple  "yes"  or  "no"  responses  to  the  question,  "is 
it  needed?",  to  having  a  job  incumbent  allocate  proportionate  times  of 
instructional  hours  that  s/he  feels  would  optimally  prepare  someone  for 
an  assignment.),  and 

d.  A  questionnaire  on  the  POI  but  using  the  CODAP  task  format. 

As  this  is  an  exploratory  effort  the  questions  that  we  hope  will  be 
answered  are  qualitative,  not  quantitative.     These  are: 

a.  Can  these  skills  be  adequately  described  in  terms  of  tasks 
and/or  topics? 

b.  Are  the  representations  of  these  skills  meaningful  to  job 
incumbents  who  have  been  trained  in  these  skills  and  are  now  in  positions 
where  these  skills  are  required? 
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c.    Are  the  data  meaningful  to  CGSC  currlculiom  development  personnel? 


TRAINING  INFORMATION  FEEDBACK  SYSTEM 

Concurrent  with  the  exploratory  effort  for  non-procedural  tasks  is 
the  development  of  a  Training  Information  Feedback  System  (TIFS)  for 
CGSC.    The  objective  is  to  create  clusters  of  similar  tasks  into  data 
elements  called  Job  Certification  Components  (JCCs)  which  can  be  used  in 
computerized  data  bases  for  individual  officer  competency  certification, 
for  feedback  to  curriculum  developers,  career  management,  and  specialty 
proponents  professional  development  prograuns. 

The  JCCs  can  be  useful  for  curriculum  development  if  it  is  possible 
to  show  differential  performance  between  those  officers  who  have  attended 
the  appropriate  education/training  course(s)  and  officers  who  have  not 
attended  but  are  nonetheless  serving  in  the  same  positions  as  graduates. 
The  target  population,  therefore,  is  made  up  of  field  grade  officers 
serving  in  the  same  positions,  the  first  set  being  graduates  of  CGSC  and 
the  second  set  being  those  who  have  not  attended.    Occupational  analysis 
techniques  will  be  used  for  development  of  JCCs  for  these  duty  positions. 
JCCs  will  then  be  verified  as  useful  for  a  TIFS  for  CGSC  by  analysis  of 
job  performance  of  officers  from  both  sets. 
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A  Technique  for  Selecting 
Electronic  Specialties  for  Consolidation 


by 


Hendrick  W.  Ruck 


Air  Force  Human  Resources  Laboratory 
Brooks  AFB,  Texas 


The  opinions  and  conclusions  expressed  in  this  paper 
are  those  of  the  author  and  are  not  necessarily 
those  of  the  United  States  Air  Force. 


The  Air  Force  occupational  classification  system  for  enlisted 
personnel  is  composed  of  approximately  250  specialties.    These  special- 
ties cover  a  wide  range  of  occupations  such  as  band  members,  medical 
technologists,  pneudraulics  repairmen,  and  aircraft  control  and  warning 
radar  repairmen.    Approximately  50  of  these  specialties  are  generally 
considered  to  be  "electronic  specialties."    These  electronic  specialties 
(see  Table  1)  are  vital  to  the  Air  Force,  since  the  airmen  in  these 
specialties  have  the  responsibility  for  maintaining  the  Air  Force's 
global  communications  network,  defensive  surveillance  systems,  and  air 
navigation  and  cormunication  systems.    Airmen  in  these  specialties 
comprise  somewhat  more  tiian  10  percent  of  the  enlisted  force.  Even 
more  important,  though,  is  the  investment  the  Air  Force  makes  in 
training  these  airmen.    Technical  training  designed  to  give  initial 
skills  to  airmen  in  electronic  specialties  is  costlier  than  technical 
training  in  other  specialties  in  both  time  and  equipment.    For  purposes 
of  efficient  personnel  management,  effective  personnel  utilization, 
and  efficacious  training,  the  Air  Force  is  seriously  considering 
consolidating  several  of  these  specialties. 


Table  1 

Examples  of  "Electronic"  Specialties 


302X0 
305X4 
316X2 
316X3 
321X1 
328X3 
341X3 
341X6 
361X0 
362X3 
403X0 


Weather  Equipment  Specialist 

Electronic  Computer  Systems  Specialist 

Missile  Electronic  Equipment  Specialist 

Instrumentation  Mechanic 

Defensive  Fire  Control  Systems  Mechanic 

Electronic  Warfare  Systems  Specialist 

Analog  Flight  Simulator  Specialist 

Digital  Navigation/Tactics  Training  Devices  Specialist 

Outside  Wire  &  Antenna  Maintenance  Repairman 

Missile  Control  Communications  Systems  Specialist 

Biomedical  Equipment  Maintenance  Specialist 
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Consolidation  of  specialties  offers  the  Air  Force  several  advan- 
tages.   These  advantages  include  (a)  increased  operational  flexibility 
in  utilization  of  personnel  within  field  units,  (.b)  simpler  assignments 
due  to  larger  pools  of  eligible  incumbents  and  fewer  specialties,  (c) 
simpler  training  due  to  fewer  initial-skill  courses,  and  (d)  reduced 
manning,  since  specialists  would  have  broader  expertise  and  therefore 
fewer  specialties  (and  specialists)  would  be  involved  in  maintaining 
complex  systems. 

Electronic  principles  are  relatively  well  defined  and  are  generally 
regarded  as  necessary  prerequisite  knowledge  for  job  proficiency  in 
electronic  specialties.    The  assumption  that  there  are  underlying 
principles  that  are  common  across  electronic  specialties  has  offered 
the  possibility  of  studying  the  actual  overlap  in  electronic  principle 
utilization  among  these  specialties.    Electronic  principles  are  rather 
easily  identified,  since  the  Air  Force  offers  common  core  courses  in 
electronics  as  prerequisites  to  entry  into  the  equipment  portion  of 
specialist  courses.    The  purpose  of  this  paper  is  to  present  preliminary 
results  of  a  commonality  analysis  among  20  of  the  electronic  specialties 
and  to  discuss  the  procedures  used  in  the  analysis.    The  electronic 
specialties  analyzed  in  this  study  are  all  from  two  career  fields, 
Communications-Electronics  Systems  and  Wire  Communications  Systems. ^ 
These  fields  contain  24  specialties  that  maintain  ground  communications 
sytems.    Air  Force  managers  have  expressed  interest  in  reducing  the^ 
number  of  ground  electronics  specialties  for  reasons  described  earlier. 

Commonality  and  Consolidation  Considerations 

When  looking  at  the  feasibility  of  consolidating  specialties, 
information  concerning  at  least  three  personnel-related  subsystems  is 
required:  the  training,  manning,  and  recruiting  subsystems.    An  outline 
of  the  information  that  must  be  synthesized  and  analyzed  in  the  process 
of  making  consolidation  decisions  is  presented  in  Table  2. 


Table  2 

Some  Considerations  Relating  to  Consolidation  of  Specialties 
Training  Manning  Recruiting 


Equipment  Similarity 


Work  Center  Location 


Recruiting  Difficulty 


Job/Task  Similarity 


Total  Manning 


Aptitude  Requirements 


Underlying  Principles/ 
Knowledge  Similarity 


CONUS/Overseas  Ratio 


Attrition 


Unit  Manning 
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Data  relating  to  the  similarity  of  equipment  maintained  or  used 
in  different  specialties  can  be  gathered  from  routine  occupational 
surveys,  special  surveys,  logistics  or  functional  managers,  or  technical 
orders.    Regardless  of  the  data  source,  however,  expert  judges  would 
be  required  to  provide  measures  of  similarity.    In  the  ground  communica- 
tions electronics  araa  alon?  (24  specialties)  over  1400  end  items  are 
managed  within  the  Air  Force  logistics  system.    It  is  difficult  to 
estimate  how  many  additional  major  command  specific  end  items  are  in 
the  inventory.    If  one  extends  a  similarity  analysis  to  the  total 
electronics  community,  it  can  be  readily  seen  that  generating  similarity 
overlap  measures  on  end  items  of  equipment  for  electronic  specialties 
would  be  an  overwhelming  task.    The  difficulty  in  such  an  analysis  is 
that  judges  familiar  with  several  items  of  equipment  would  be  required 
to  estimate  similarity.    Aside  from  the  statistical  problem  of  combining 
judgments  made  on  different  combinations  of  equipment  end  items,  there 
are  at  least  two  other  difficulties.    One  is  the  number  of  judges  that 
may  be  required,  and  the  other  is  the  definition  of  the  dimensions  of 
similarity. 

As  a  second  approach,  one  might  ask,  then,  how  difficult  is  it  to 
gauge  the  similarity  of  jobs  and  tasks  performed  in  the  electronics 
community?   A  ready  source  of  data  exists  since  Air  Force  occupational 
survey  data  have  been  collected  and  analyzed  for  most  of  the  specialties 
in  question.    Once  again,  the  scope  of  the  problem  limits  the  appropri- 
ateness of  this  approach.    Several  thousand  tasks  performed  by  persons 
in  several  hundred  job  types  have  been  identified  in  the  occupational 
analysis  of  electronic  specialties.    Even  where  tasks  are  worded  similar- 
ly, experts  must  judge  equivalence  between  tasks.    Since  no  acceptable 
taxonomy  has  been  developed  for  tasks  or  job  types,  it  would  be  an 
extremely  laborious  task  to  compare  all  tasks  with  one  another  or  all 
job  types  with  one  another. 

What,  then,  could  be  done  to  reduce  the  magnitude  of  the  comparison 
problem?    Developing  a  priori  groupings  of  specialties  that  are  good 
candidates  for  consolidation  is  one  solution.    Various  groupings  have 
been  developed  and  sponsored  by  different  Air  Force  and  major  command 
managers.    Unfortunately,  these  grouping  schemes  have  rarely  been  con- 
sistent with  one  another  and,  therefore,  considerable  debate  has 
arisen  concerning  each  proposal.    An  empirical  approach  to  developing 
grouping  schemes  would  be  possible  since  data  on  utilization  of  funda- 
mental electronic  principles  could  be  available  for  all  of  the  special- 
ties in  question.    The  rationale  for  this  approach  would  be  that 
initial  groupings  based  on  common  underlying  principles  should  be 
developed  and  subsequent  analyses  may  then  be  performed  for  specialties 
with  high  commonality  in  principles.    Under  this  approach  the  assumption 
is  made  that  it  would  be  unwise  to  consolidate  specialties  that  have 
little  or  no  commonality  in  principles  or  knowledge  used  on  the  job. 
Although  this  assumption  may  appear  at  first  to  be  trite,  it  may  be 
appreciated  by  those  who  have  familiarity  with  some  of  the  decisions 
on  specialty  structure  that  have  been  made  in  the  past. 
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The  considerations  relating  to  manning  are  somewhat  easier  to 
measure  than  the  training  considerations.    At  the  unit  level,  certain 
specialties  have  been  traditionally  undermanned,  while  others  receive 
priority  manning.    For  the  problem  at  hand,  ground  communications- 
electronics  maintenance,  unit  manning  is  critical.    This  is  due  to  the 
fact  that  many  of  the  positions  require  round-the-clock  manning  by 
fully-qualified  personnel.    Units  that  require  more  than  one  specialty 
on  24-hour  duty  should  be  identified,  and,  if  there  is  agreement, 
those  specialties  would  be  good  candidates  for  consolidation  providing 
enough  job  similarity  exists.    CONUS/overseas  ratio  considerations  are 
easily  measured  in  terms  of  ratios.    Traditionally,  specialties  with 
high  overseas  imbalances  have  been  suggested  as  candidates  for  merging 
with  specialties  that  have  high  CONUS  ratios.   Again,  such  possibilities 
should  be  tempered  with  job  similarity  measures. 

Recruiting  considerations  are  more  difficult  to  use  in  making 
consolidation  decisions.    Obviously,  specialties  that  are  consolidated 
should  have  similar  aptitude  requirements.    However,  the  impact  of 
recruiting  difficulty  and  attrition  on  consolidation  decisions  requires 
policy  makers'  and  researchers'  attention.    It  is  not  clear  whether  it 
would  be  in  the  best  interest  of  the  Air  Force  to  merge  specialties 
with  high  and  low  recruiting  difficulty.    Would  such  a  merger  average, 
increase,  or  lower  subsequent  recruiting  difficulty  for  the  new  specialty 
Similar  questions  arise  in  dealing  with  the  impact  of  merging  on 
subsequent  attrition. 

This  paper  is  concerned  with  developing  candidates  for  consolidation 
based  on  underlying  principles  and  knowledge  similarity  of  jobs.  It 
is  assumed  that  other  considerations  and  analyses  suggested  here  would 
be  made  following  the  similarity  analyses  made  on  underlying  principles. 

Electronic  Principles  Job  Inventory 

The  Electronic  Principles  Job  Inventory  (EPI)  and  its  development 
have  been  presented  previously  (O'Connor,  Ruck  &  Driskill,  1978;  Ruck, 
1977)  and  therefore  will  be  only  briefly  described  in  this  paper.  The 
EPI  contains  1257  items  covering  the  universe  of  electronic  fundamentals 
as  defined  by  Air  Training  Command  fundamental  courses  (as  of  1974) 
and  instructors  and  supervisors  of  those  courses.    The  1257  items  were 
written  so  that  the  job  incumbent  could  indicate  whether  or  not  he  or 
she  uses  each  principle  on  the  present  job.    Lead-in  questions  and 
routing  instructions  were  provided  to  minimize  the  time  required  to 
complete  the  EPI  booklet.    For  many  sets  of  questions  a  "do  not  remember" 
question  was  included  as  an  item  after  a  list  of  detailed  items  was 
offered.    This  allowed  the  incumbent,  for  example,  to  indicate  that  he 
or  she  replaced  capacitors  on  the  present  job  but  could  not  remember 
which  type  of  capacitor  was  involved-    Table  3  presents  sample  questions. 
It  is  important  to  note  that  the  EPI  was  developed  at  the  Occupational 
Measurement  Center  for  the  express  purpose  of  course  validation  and 
was  not  originally  intended  to  be  a  research  tool.    The  Occupational 


Measurement  Center  has  collected  EPI  data  from  59  special ites  as  of 
this  writing. 


El-1    Do  you  work  with  coupling  devices  in  your  present  job?    If  no,  go 
to  i tem  E2-1 ;  i f  yes^,  continue. 

Do  you  identify  on  schematic  diagrams  and  relate  to  the  actual  circuitry 
the  components  associated  with  any  of  the  following  types  of  coupling? 


Do  you  work  with  any  of  the  following  types  of  coupling  circuits? 


El -8  Directly  coupled  circuits 

El-9  Capacitive-resistive  coupled  circuits 

El -10  Capacitive-inductive  coupled  circuits 

El -11  Transformer  coupled  circuits 

El -12  Don't  remember  which  type  of  coupling 


Methods  Used  to  Measure  Commonality 

The  EPI  was  selected  as  the  instrument  to  measure  underlying 
principles/knowledge  used  within  each  specialty  and,  ultimately,  as 
the  input  for  commonality  analysis.    The  EPI,  for  purposes  of  this 
analysis,  is  assumed  to  have  included  all  of  the  relevant  principles 
or  knowledge  required  within  the  Air  Force  electronics  community. 
Further,  each  item  is  assumed  to  have  similar  meanings  across  different 
specialties.    Both  assumptions  are  justified  based  on  the  development 
and  validation  procedures  used  in  generating  the  instrument. 

The  criterion  measure  from  the  EPI  that  was  selected  was  the 
percent  of  journeymen  (5-skiil  level)  personnel  in  each  specialty 
a/iswering  "yes"  to  each  item.    Several  difficulties  arise  in  determining 
conmonality  once  the  criterion  has  been  selected.    First,  no  specialties 
have  ICQ  percent  overlap  in  principles  used.    Second,  no  measure  of 
criticality  (importance,  difficulty,  complexity,  etc.)  is  currently 
available  for  inclusion  in  the  analysis.    Third,  it  is  impractical  to 
merge  simple  specialties  with  complex  ones.    Last,  correlational 
measures  could  be  quite  misleading  due  to  the  possible  high  number  of 
common  zeroes.    That  is,  correlations  would  be  inflated  due  to  the 


Table  3 
Sample  EPI  Questions 


El-2 
El-3 
El -4 


RC  coupling 
Impedance  coupling 
Transformer  coupling 
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number  of  items  that  many  specialties  will  have  zero  responses  in 
common. 


The  statistical  technique  used  in  the  analysis  was  Ward's  hier- 
archical clustering  technique  (Ward,  1961).    The  sum  of  the  absolute 
value  of  the  differences  in  percent  using  over  the  1257  items  was 
input  as  the  difference  measure.    The  technique  was  employed  so  that 
common  principles,  the  degree  to  which  principles  are  used,  and  the 
size  of  each  group  being  analyzed  would  be  considered.  Correlations 
among  specialties  that  grouped  in  the  cluster  analysis  were  analyzed 
to  provide  additional  interpretation  of  the  overlap  figures.    Since  no 
attempt  was  to  be  made  to  totally  reorganize  all  electronic  specialties, 
separate  grouping  analyses  were  performed  for  the  16  ground  communica- 
tions electronic  specialties  and  the  4  wire  and  cable  specialties. 

Preliminary  Results 

Several  specialties  failed  to  group  with  other  specialties  in  the 
pool.    That  is,  they  exhibited  low  overlap  values  with  the  most  similar 
specialty  or  low  correlations  with  the  most  similar  specicilty. 
Table  4  lists  these  specialties  and  pertinent  EPI  data.    Two  of  the 
specialties.  Telecommunication  Systems/Equipment  Maintenance  (AFS  30652) 
and  Telecommunications  Systems  Console  Specialist/Attendant  (AFS  30750) 
have  very  low  utilization  of  electronic  fundamentals,  and  would  not 
appear  to  be  good  candidates  for  consolidation  with  any  of  the  more 
complex  specialties.    The  Television  Equipment  Repair  Specialty  (AFS 
30455)  appears  to  have  low  commonality  with  other  specialties  even 
though  it  has  rather  high  utilization  of  fundamentals.    Further  detailed 
analysis  is  required  for  this  specialty.    Similarly,  further  analysis 
is  required  to  determine  why  the  Au'io  Tracking  Radar  Repair  Specialty 
(AFS  30353)  has  low  commonality  in  spite  of  moderate  usage  of  principles. 

Results  of  the  grouping  analyses  are  shown  in  Table  5.  Since 
four  specialties  were  omitted  from  this  analysis,  these  results  should 
be  viewed  as  suggestive  in  nature.    The  groupings  of  specialties  have 
been  reviewed  by  Air  Force  managers  and  technicians  in  the  ground 
communications  electronics  career  field  for  their  comments  prior  to 
performing  additional  analyses.    It  should  be  noted  that  the  groupings 
displayed  in  Table  5  would  be  further  analyzed  using  the  additional 
considerations  discussed  earlier  in  this  paper.    Several  of  the  group- 
ings are  congruent  with  recommendations  that  have  already  been  made, 
and  therefore  validate  prior  judgments.    Group  E  (Telephone  Switching 
Equipment  and  Telephone  Equipment  Installation/Repair)  has  been  formal- 
ly proposed  as  a  new  specialty.    Two  of  the  three  specialties  in 
group  B  (Weather  Equipment  and  Airborne  Meteorological/Atmospheric 
Research)  and  in  group  D  (Radio  Relay  Equipment  and  Ground  Radio 
Communications  Equipment)  had  also  been  formally  proposed  earlier. 
However,  the  third  specialty  in  group  B  (Aircraft  Control  and  Warning 
Radar)  and  the  third  specialty  in  group  D  (Space  Communications  Systems 
Equipment)  had  not  been  proposed  as  possibilities  for  merging  within 


Table  4 

Specialties  That  Have  Little  Coinmonality  with 
Other  Cownication  Electronics  Maintenance  Specialties 
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Auto  Tracking  Radar  Repair 

23 

83 
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30455 

Television  Equipment  Repair 

32 

81 

25 

NA 

30652 
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11 

41 

89 
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Maintenance 

30750 
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Table  5 

Specialties  That  Nave  Potential  for  Cofisolidation 


Percent  Correlation 

Average  Percent  Overlap  of  Percent 
Percent  lisedBy  Between  Heibers 

M  Osed    Uny  Specialties  Performing 


A  30554  Electronic  Computer  Systems  Specialist  21  56 

30651  Elec-Mech  Coi  !  Crypto  Equip  Sys  Spec  21  54      95  .94 

B   30250  ieather  Equipment  Specialist  37  87 

30352      Radar  Specialist  37  89      93  .93 

30251  ABN  Ueteorological/taspheric  Res  Equip  Repair  23  40      89  .77* 


C   30451  Flight  Facilities  Equip  Repair  33 
^        »  Space  Surveillance  Radar  Repair  29  74 


30150 

Radio  Relay  Equipment  Repair 
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those  groups.  Also,  groups  A,  C,  F,  and  G,  although  easily  explained 
by  knowledgeable  electronics  experts,  had  not  been  proposed,  prior  to 
this  analysis,  as  new  consolidated  specialties. 

The  empirical  groupings  of  specialties  based  on  similarity  of 
principles  used  as  measured  by  the  EPI  have  been  supported,  in  some 
cases,  by  prior  recommendations,  and  in  other  cases,  by  expert  judgment. 
Although  additional  analyses  are  required  prior  to  recommending  imple- 
mentation of  the  new  consolidated  specialties,  the  empirical  procedures 
have  provided  an  important  service  by  reducing  significantly  the 
number  of  comparisons  that  should  be  made.    In  this  study,  for  example, 
190  comparisons  of  pairs  of  specialties  would  have  to  be  made  to 
examine  all  possible  pairwise  combinations,  and  1140  combinations  of  3 
specialty  groups  must  be  studied  in  order  to  examine  all  groupings  of 
that  size.    This  analysis  has  narrowed  the  number  of  groups  to  be 
included  in  subsequent  studies  to  only  7. 

Plans  for  the  Future 

The  results  of  this  analysis  indicate  that  the  technique  of 
grouping  specialties  using  EPI  data  has  considerable  promise.    Many  of 
the  groupings  are  logical  and  might  have  been  expected.    However,  some 
of  the  groupings  were  not  expected  and  require  further  detailed  analysis 
All  of  the  potential  consolidation  groupings  should  be  viewed  as 
tentative  and  would  be  finalized  only  after  additional  analyses  of 
occupational  survey,  manning,  recruiting,  and  planning  data  have  been 
performed.    The  contribution  of  the  EPI  data  has  suggested  promising 
empirical  groupings  of  specialties,  something  that  was  heretofore  not 
possible.    The  EPI  analysis  has  significantly  reduced  the  scope  of  the 
comparison  problem. 

Several  additional  studies  are  planned  in  the  near  future.  The 
analyses  tentatively  reported  in  this  paper  will  be  performed  again 
once  data  for  the  complete  array  of  specialties  have  been  collected. 
Similar  analyses  will  be  performed  for  the  total  electronic  community 
and  within  career  areas  within  the  electronic  community. 

Scales  could  be  developed  to  further  enhance  the  power  of  the 
EPI.    Appropriate  scales  include  measures  of  "complexity"  or  "difficulty' 
of  the  items.    Once  the  scales  have  been  developed,  grouping  could  be 
performed  on  a  measure  of  percent  performing  by  (times)  complexity. 
In  addition,  the  new  scale  information  would  be  quite  useful  to  training 
specialists  in  course  development. 

Ultimately  it  might  be  useful  to  factor  analyze  the  EPI  so  that  a 
shorter  non-redundant  version  could  be  used.    Such  analysis  would  be 
possible  because  there  is  considerable  overlap  among  items  in  the 
inventory.    This  overlap  is  inherent,  since  the  inventory  covers 
knowledge,  principle,  task,  and  skill  items  of  various  degrees  of 
specificity.    The  shorter  version  would  be  more  efficient  in  terms  of 
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data  collection  and  analysis,  and  would  allow  for  "cleaner"  grouping 
analyses. 

The  possible  uses  of  EPI  data  have  been  enumerated  elsewhere 
(O'Connor,  Ruck  &  Driskill,  1978),    Clearly,  the  utility  of  the 
instrument  is,  in  large  part,  due  to  its  universality.    This  paper  has 
described  one  of  many  practical  applications  of  the  EPI  data  base. 
However,  additional  work  is  required,  both  in  the  development  of 
analysis  techniques,  and  in  the  refinement  of  the  EPL 
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Differential  Field  Assignment  Patterns  for 
Male  and  Female  Soldiers^ 


INTRODUCTION 


Background 

Along  with  the  other  military  services,  the  United  States  Army 
has  been  traditionally  an  almost  totally  male  instit Mtion.     Binkin  and 
Bach  (1977)  have  outlined  the  minimal  role  played  by  women  in  the 
military  prior  to  World  War  II,  their  significant  contribution  during 
that  war,  and  the  consequences  of  the  recent  expanded  role  of  women  in 
the  military.     This  expanded  role,  however,  has  not  been  accomplished 
without  controversy.    And  the  disputes  regarding  the  role  of  women  led 
to  Army  management's  perception  of  the  need  for  information  on  female 
enlisted  personnel,  particularly  with  respect  to  their  performance. 

Accordingly,  since  1972,  considerable  attention  has  been  given  to 
assessing  the  effect  of  expanding  the  role  of  women  in  the  Army.  A 
number  of  different  studies  of  women  in  the  military  have  been  conducted. 
Two  major  research  efforts  by  the  Array  Research  Institute  for  the 
Behavioral  and  Social  Sciences  (ARI)  have  concerned  measuring  the  impact 
of  female  participation  on  performance.     One  of  these  investigations, 
called  MAX  WAC,  involved  a  72-hour  field  exercise  and  assessed  the  impact 
of  varying  levels  of  female  content  on  unit  performance  (Army  Research 
Institute,  1977).     Later  research,  known  as  REF  WAC,  evaluated  individual 
and  group  performance  during  an  extended  field  training  exercise  (Johnson, 
Cory,  Day,  &  Oliver,  1978). 

Problem 

During  the  REF  WAC  .     a  collection,  there  were  comments  from  the  REF 
WAC  participants  concerning  f'if ferential  treatment  of  men  and  women. 
These  subjective  impressions  were  supported  by  the  pretest  and  posttest 
questionnaire  responses,  which  showed  that  sizeable  proportions  of 
respondents  (officers,  NCO*s  enlisted  men,  enlisted  women)  reported 
differential  treatment  of  male  and  female  soldiers  by  officers  and  NCO's. 
In  general,  about  a  third  to  more  than  a  half  of  the  respondents  believed 
men  and  women  were  treated  differently. 

One  mode  of  differential  treatment  may  be  the  assignment  of  different 
jobs  to  men  and  women.     Inspection  of  the  REF  WAC  work  availability  data 
did,  in  fact,  indicate  that  differential  assignment  patterns  occurred 
during  the  field  training  exercise.     Specifically,  it  was  found  that  the 
mean  number  of  regular  work  hours  was  greater  for  male  enlisted  personnel 

-"•The  authors  wish  to  express  their  appreciation  to  Mr.  Sidney  Sachs  for 
his  invaluable  contributions  to  the  data  analysis. 
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than  for  female  enlisted  personnel,  while  the  mean  number  of  special  duty  [ 
hours  was  greater  for  female  enlisted  personnel  than  for  male  enlisted 
personnel.    Accordingly,  it  was  decided  to  identify  the  differential  patterns 
of  regular  and  special  duty  assignments  for  men  and  women  soldiers  and  to 
investigate  the  relationship  of  these  patterns  to  possible  causal  variables. 

One  variable  that  might  be  related  to  differential  assignment  patterns 
is  the  mission  of  the  unit.    Since  different  types  of  units  have  different 
functions,  assignment  patterns  could  vary  with  the  type  of  unit.  And 
although  the  REF  WAC  data  analysis  showed  that  women  had  more  special  duty 
assignments  than  men,  no  analysis  was  made  of  the  number  of  times  special 
duty  was  assigned  to  each  person,  nor  was  the  type  of  special  duty  broken 
out  for  men  and  women.    Physical  difficulty  of  the  Military  Occupational 
Specialty  (M0S)2  might  be  another  factor  affecting  differential  assignments, 
since  the  REF  WAC  results  indicated  that  supervisors  were  strongly  influenced 
by  this  variable  in  making  hypothetical  assignments  to  jobs.    Finally,  it 
was  felt  that  a  soldier's  level  of  competence  might  affect  the  type  of  duty 
received,    performance  ratings,  then,  might  be  related  to  assignment 
patterns. 

Research  Questions 

The  following  research  questions  were  investigated: 

1.  What  are  the  relationships  among  gender,  type  of  unit,  and  type  of 
duty? 

2.  How  is  the  frequency  of  special  duty  related  to  gender? 

3.  How  is  type  of  special  duty  task  related  to  gender? 

4.  How  is  physical  difficulty  of  DMOS  related  to  assignment  patterns? 

5.  How  are  daily  performance  ratings  related  to  assignment  patterns? 


METHOD 


The  subjects,  instruments,  and  procedures  used  in  the  REF  WAC 
research  project  are  described  in  detail  in  Johnson  et  al.  (1978).  Brief 
descriptions  of  the  methodological  aspects  which  pertain  specifically  to 
the  research  reported  here  are  given  below. 


2ln  the  Army,  a  Military  Occupational  Specialty  (MOS)  is  a  grouping  of 
duty  positions  requiring  similar  qualifications  and  the  performance  of 
closely  related  duties.    This  job  (e.g.,  clerk-typist,  wheel  vehicle 
mechanic)  may  be  the  soldier's  primary  one  (Primary  Military  Occupational 
Specialty,  or  PMOS),  his  or  her  secondary  one  (Secondary  Military  Occupa- 
tional Specialty,  or  SMOS),  or  the  one  which  is  his  or  her  duty  assignment 
(Duty  Military  Occupational  Specialty,  or  DMOS).    The  DMOS  may  be  the 
same  as  the  soldier's  PMOS  or  FMOS,  or  it  may  be  an  entirely  different  MOS. 
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Subjects 


The  population  for  this  research  included  the  enlisted  personnel  of 
22  maintenance,  medical,  military  police,  signal,  and  supply  and  transportation 
units.    These  companies  were  among  those  participating  in  REFORGER  77,  which 
involved  an  extended  field  training  exercise  conducted  in  West  Germany  in 
the  autumn  of  1977.    The  22  units  were  selected  because  they  contained 
women  in  sufficient  numbers  to  provide  a  meaningful  sample. 

Members  of  the  male  and  female  cohorts  of  the  REF  WAG  project 
constituted  the  sample  for  the  research  reported  in  this  paper.  All 
women  in  the  selected  companies  were  contained  in  the  female 

cohort.    A  male  from  the  same  company  was  matched  as  closely  as  possible 
with  each  female  on  the  basis  of  paygrade,  length  of  service,  MOS,  age, 
and  intelligence  test  score  (see  Johnson  et  al.,  1978,  p.  11-10).  These 
matched  males  constituted  the  male  cohort. 

Instrument 

Data  analyzed  in  this  investigation  were  obtained  from  the  Schedule  4 
form,  "Daily  Record  of  Work  Availability  and  Performance."  This 
instrument,  described  in  Johnson  et  al.   (1978,  pp.  11-18  and  11-21),  was 
used  to  record  supervisors*  performance  ratings  and  reports  of  work 
availability  for  each  member  of  the  male  and  female  cohorts.     The  daily 
performance  ratings  were  made  on  a  seven-point  Likert-type  scale  ranging 
from  "performed  all  tasks  in  a  superior  manner"  to  "performed  all  tasks 
in  an  inferior  manner."    The  work  availability  data  consisted  of  records 
of  assigned  hours  and  lost  time.     Only  the  assigned  hours  for  regular 
duty  and  for  special  duty  were  of  interest  to  the  research  reported  in 
this  paper. 

Data  Collection 

The  Schedule  4  (Daily  Record  of  Work  Availability  and  Performance) 
data  were  collected  by  noncommissioned  officers  (NCO's)  assigned  to  each 
company.    Each  day  during  the  field  training  exercise,  the  NCO  data 
collectors  obtained  performance  ratings  on  each  member  of  the  male  and 
female  cohorts  in  their  unit.     The  performance  ratings  were  made  by  the 
individual's  regular  supervisor  and/or  the  supervisor  of  the  special  duty 
to  which  the  soldier  was  assigned.     The  number  of  hours  on  regular  duty 
and  on  special  duty  was  also  recorded. 

Design  and  Data  Analysis 

Variables.     The  independent  variable  which  was  the  principal  focus 
of  this  research  was  gender.     That  is,  the  primary  comparisons  were  of 
male  and  female  enlisted  personnel.     The  other  major  independent  variable 
of  interest  was  type  of  unit  (maintenance,  medical,  military  police, 
signal,  and  supply  and  transportation).     Both  gender  and  type  of  unit  were 
included  in  all  analyses.     (Due  to  small  n's,  however,  comparisons  among 
units  were  not  always  meaningful.)     For  some  analyses,  the  following 
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independent  variables  were  also  included:    duty  assignment  patterns  ■ 
(personnel  with  regular  duty  only,  personnel  with  special  duty  and  regular  ^ 
duty)  and  type  of  duty  day  (days  with  special  duty,  days  with  no  special 
duty) . 

The  dependent  variables  which  were  used  in  the  analyses  reported  In 
this  paper  included  the  following: 

1.  Regular  duty  hours:    mean  number  of  hours  per  day  spent  on 
regular  duty. 

2.  Special  duty  hours;    mean  number  of  hours  per  day  spent  on 
special  duty. 

3.  Frequency  of  special  duty:    number  of  times  a  given  individual 
was  assigned  to  special  duty. 

4.  Type  of  special  duty;    task  assigned  to  soldier  on  special  duty — 
vehicle  maintenance  (military  police  units  only),  guard  duty,  kitchen 
police,  and  "other"  (see  Johnson  et  al. ,  1978,  pp.  Ill- 19  to  HI-21) . 

5.  Difficulty  of  MPS:    The  physical  difficulty  of  the  Duty  Military 
Occupational  Specialty  (DMOS)  was  evaluated  according  to  the  classification 
made  in  Johnson  et  al. ,  1978  (see  pp.  III-21  and  III-22). 


"1"  =  all  tasks  can  be  performed  by  a  woman  in  a  field  environment. 

"2"  =  most  tasks  can  be  performed  by  a  woman  in  a  field  environment. 

"3"  =  few  tasks  can  be  performed  by  a  woman  in  a  field  environment. 

If  no  DMOS  was  recorded,  the  subject's  Primary  Military  Occupational 
Specialty  (PMOS)  was  used.    A  few  subjects  had  MOS  which  were  not  contained 
in  the  classification  found  in  Johnson  et  al.  (1978).    In  these  cases, 
the  MOS  was  evaluated  by  the  second  author  in  consultation  with  a 
senior  military  officer.    For  analysis  purposes,  categories  "2"  and  "3" 
were  combined.    All  MOS  classified  as  "1"  were  defined  as  "easier  MOS," 
and  the  remaining  MOS  were  considered  "harder  MOS." 

6.    Performance  ratings;  mean  ratings  of  performance  for  special 
duty  or  for  regular  duty. 

Analyses.    Several  kinds  of  analyses  were  employed  to  explore  the 
relationships  among  the  variables  described  above.    The  principal 
analysis  involved  a  repeated  measures  ANOVA  which  investigated  the 
interrelationships  of  gender,  type  of  unit,  and  type  of  duty.    The  two 
between-subjects  factors  were  gender  (male  and  female)  and  type  of  unit 
(maintenance,  medical,  military  police,  signal,  and  supply  and  transportation) 
The  wi thin-subjects  factor  was  type  of  duty  (regular  and  special). 
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In  addition  to  the  ANOVA,  severa,!  chi  square  tests  were  conducted  to 
examine  the  interrelationships  among  the  variables.    For  the  breakdown  of 
special  duty  by  frequency  and  type  for  male  and  female  enlisted  personnel, 
only  the  totals  for  all  types  of  companies  were  used  for  the  purpose  of 
analysis.    Although  there  appeared  to  be  differences  among  the  units,  the 
^'s  wefe  §o  small  that  more  detailed  analyses  did  not  seem  warranted. 

A  chi  square  test  was  also  employed  in  assessing  the  relationship 
between  special  duty  and  MOS  difficulty.    The  first  test  was  of  the 
number  of  males  and  females  without  special  duty  who  were  in  harder  and 
easier  MOS.     (Since  inspection  of  the  data  revealed  males  and  females 
with        special  duty  were  almost  identically  distributed  in  harder  and 
easier  MOS,  no  statistical  test  was  run  for  this  group.)    The  second  chi 
square  test  involved  the  numbers  of  males  and  females  (not  broken  down  by 
whether  or  not  they  had  had  special  duty)  into  harder  and  easier  MOS. 

No  statistical  tests  were  performed  on  the  performance  rating  data. 
Means  for  the  three  types  of  duty  days  were  so  similar  for  men  and  women 
that  further  analysis  was  considered  unnecessary. 
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RESULTS 


Gender,  Type  of  Unit,-  and  Type  of  Duty 

Table  1  contains  means  and  standard  deviations  for  regular  and  special 
duty  hours  for  males  and  females  by  type  of  unit.    The  results  show  that 
for  each  type  of  unit  males  have  more  hours  of  regular  duty  than  females. 
However,  for  special  duty,  with  the  exception  of  the  military  police  units, 
females  have  more  hours  of  special  duty  than  the  males.    This  pattern  also 
appears  in  the  totals  for  all  males  and  all  females. 

Table  2  pressnts  the  results  of  the  repeated  measures  analysis  of 
variance.    The  results  of  the  analysis  revealed  that  all  three  main  effects 
(for  gender,  type  of  unit,  and  type  of  duty)  were  significant  beyond  the 
.001  level.    Two  two-way  interactions  were  also  significant:    type  of  duty 
by  gender  (£<.001)  and  type  of  duty  by  type  of  unit  (£<.001).    There  was 
no  significant  interaction  between  gender  and  type  of  unit  or  for  gender, 
type  of  unit,  and  type  of  duty. 

Frequency  of  Special  Duty 

Table  1  shows  differences  between  males  and  females  for  mean  number 
of  hours  of  regular  duty  and  special  duty  for  each  type  of  unit.    Table  3 
presents  a  breakout  of  how  many  times  individuals  were  assigned  special 
duty  in  each  type  of  unit.    As  can  be  seen  in  the  table,  most  males  and 
females  who  have  special  duty  have  it  only  one  or  two  times.    Of  the  five 
types  of  units,  the  military  police  and  signal  units  stand  out  as  the  only 
ones  in  which  the  number  of  males  who  have  one  or  two  instances  of  special 
duty  exceeds  the  number  of  females  with  one  or  two  instances  of  special  duty. 
It  is  also  apparent  from  the  table  that  of  all  soldiers  who  had  special 
duty  three  or  more  times,  there  were  considerably  more  women  than  men. 

When  a  chi  square  test  was  performed  on  the  total  number  of  males 
and  females  who  had  special  duty  one  or  more  times  (see  Table  4),  there 
was  a  statistically  difference  between  males  and  females  for  frequency 
of  special  duty  (x^  =  6,97,  £  <.01).    This  finding  most  probably  was  due 
to  the  fact  that  the  number  of  females  who  had  special  duty  three  or  more 
times  was  much  greater  than  the  number  of  males  who  had  special  duty  three 
or  more  times. 

Type  of  Special  Duty 

When  investigating  pattera,s  of  special  duty  assignments,  it  is  of 
interest  to  ascertain  what  kinii  of  tasks  are  assigned  for  special  duty 
and  if  these  tasks  differ  for  men  and  women.    It  can  be  seen  in  Table  5 
that  most  Instances  of  special  duty  were  either  guard  duty  or  kitchen 
police.    The  pattern  of  more  special  duty  for  women  that  was  previously 
noted  can  be  observed  here  also.     In  three  types  of  units  (medical, 
maintenance,  and  supply  and  transportation),  women  have  more  instances  of 
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guard  duty  and  kitchen  duty  than  men.    However,  a  chi  square  test  (see 
Table  6)  revealed  no  statistically  significant  difference  between  the 
number  of  instances  of  guard  duty  and  kitchen  police  duty  between  males 
and  females  (x^  =  .002,  £  >  .05). 

Difficulty  of  MOS 

The  totals  in  Table  7  indicate  that  a  larger  proportion  of  both 
males  and  females  with  special  duty  were  in  easier  MOS  (72%  and  71% 
respectively),  than  were  males  and  females  with  no  special  duty  (38% 
and  51%  respectively).    There  was  virtually  no  differencG  betupen  the 
proportions  of  men  and  women  with  special  duty  in  either  easier  or 
harder  MOS.     The  pattern  differed,  however,  for  personnel  with  no  special 
duty.     For  these  individuals,  males  tended  to  be  concentrated  in  the 
harder  MOS  (62%  males  vs.  49%  females),  and  a  chi  square  test 
(x2  =  5.05,  df  =  1)  (see  Table  8)  showed  this  difference  to  be  significant 
at  CC  =  .05. 

In  spite  of  the  matching  procedure  followed  in  selecting  the  male 
cohort.  Table  9  demonstrates  that  there  was  a  significant  difference  in 
MOS  difficulty  between  males  and  females,  with  males  tending  to  be  in 
harder  MOS  and  females  in  easier  MOS.    This  difference  is  significant  at 
oC  =  .05  (x^  =  5.0,  df  =  1). 

Performance 

Table  10  presents  mean  performance  scores  for  males  and  females  in 
each  type  of  unit  for  three  different  types  of  duty  days:     (1)  days  of 
special  duty  for  personnel  with  special  duty,   (2)  days  of  regular  duty 
for  personnel  with  special  duty,  and  (3)  days  of  regular  duty  for  personnel 
with  regular  duty  only.    As  can  be  observed  in  the  table,  the  data  do  not 
suggest  any  relationship  between  performance  ratings  and  type  of  duty  day. 
Therefore  no  statistical  test  was  performed. 
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DISCUSSION 


Limitations 

The  findings  of  this  research  must  be  considered  with  an  awareness 
of  the  limitations  of  the  investigation.     First,  the  data  collection  was 
constrained  by  the  general  requirement  of  noninterference  in  the  normal 
activities  of  the  subjects.    An  additional  problem  was  the  lack  of  time 
for  the  training  of  data  collectors  with  consequent  adverse  effects  on 
reliability.    In  retrospect,  it  appeared  that  the  definitions  of  terms 
and  categories  were  not  always  clear  to  the  data  collector  and  the 
supervisors  from  whom  they  collected  the  data.     The  result  of  this  lack 
of  consistent  definition  was  that  different  units  (and  different  data 
collectors)  recorded  time  in  different  ways.     Tasks  assigned  as  regular 
duty  in  one  unit,  for  example,  might  be  considered  special  duty  in 
another.     Shifts  in  one  type  of  unit  might  be  eight  hours  in  length, 
while  in  another  unit  shifts  were  considered  24  hours  long  because  the 
"on  call"  time  was  included.     In  addition,  there  were  the  usual  individual 
differences  among  data  collectors  contributing  additional  variance. 

Because  in  many  cases  the  data  were  based  on  small  ii's,  statistical 
comparisons  could  be  made  only  for  totals.     Yet  it  is  possible  that 
comparisons  across  different  types  of  units  may  not  always  be  meaningful 
due  to  differences  in  unit  mission  or  to  differences  in  classification 
of  the  performance  variables,  etc.    Hence,  generalizations  from  these 
results  should  be  made  cautiously.    The  research  does,  however,  demonstrate 
interpretablc  trends  and  suggest  lines  of  investigation  for  future 
research. 

Conclusions 

Gender,  type  of  unit,  and  type  of  duty.    Of  the  three  significant 
main  effects  found  in  the  repeated  measures  ANOVA,  the  one  for  gender 
was  of  greatest  interest.     It  can  be  concluded  that,  on  the  average,  men 
worked  significantly  more  hours  than  women  during  the  field  training 
exercise.     The  main  effect  for  type  of  unit  showed  that  the  total  amount 
of  time  worked  by  enlisted  personnel  varied  significantly  as  a  function 
of  the  type  of  unit.    These  differences  in  total  time  among  units  may  be 
due,  at  least  in  part,  to  differences  in  the  units'  missions.    As  noted 
above,  however,  some  of  the  difference  may  have  been  due  to  variations 
in  interpreting  terms.     For  example,  some  supervisors  considered  time 
"on  call"  as  part  of  the  regular  shift  and  other  supervisors  did  not. 
That  the  daily  average  for  regular  duty  hours  would  differ  significantly 
from  the  special  duty  average  was  obvious  from  visual  inspection  of  the 
data  and  not  of  interest  in  and  of  itself. 

The  pattern  of  differential  assignment  was  revealed  by  the  significant 
interaction  between  gender  and  type  of  duty.     It  was  clear  that  females 
had  significantly  more  special  duty  and  males  had  significantly  more 
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regular  duty  during  the  field  exercise.     This  pattern  was  noted  for  all 
units  for  both  types  of  duty  except  for  military  police  units  in  which 
women  had  less  special  duty.    The  interaction  of  type  of  unit  and  type 
of  duty  suggested  that  units  varied  in  terms  of  the  relative  amount  of 
regular  and  special  duty  assigned.     Again,  this  finding  may  have  been 
affected  by  differences  in  recording  data.     However,  no  interaction 
occurred  between  type  of  unit  and  gender.     Thus,  it  can  be  concluded 
that  the  amount  of  total  time  assigned  to  men  and  woman  did  not  vary  as 
a  function  of  type  of  unit.     The  three-way  interaction  was  not  significant, 
either,  indicating  no  reliable  differences  beyond  those  occurring  in  the 
two-way  interactions i 

Frequency  of  special  duty.     Because  the  n's  are  small,  no  conclusions 
can  be  reached  concerning  the  frequencies  of  special  duty  for  different 
types  of  units  (Table  3).    Overall,  however,  it  is  clear  that  women  not 
only  had  significantly  higher  daily  averages  for  special  duty  than  men 
(as  demonstrated  by  the  ANOVA  results  in  Table  2)  but  that  they  tended 
to  be  assigned  to  it  more  often  than  men  (Table  4), 

Type  of  special  duty.     When  special  duty  was  assigned,  the  most  frequent 
kinds  were  guard  duty  or  kitchen  police  duty.    A  chi  square  test  showed 
that  the  numbers  of  men  and  women  assigned  to  these  jobs  did  not  differ 
significantly.     Therefore,  it  can  be  concluded  that  no  bias  seemed  to  be 
operating  in  the  type  of  jobs  assigned  to  the  special  duty  soldiers. 
That  is,  men  did  not  tend  to  draw  guard  duty,  and  women  were  not  necessarily 
put  on  kitchen  police  duty. 

Difficulty  of  MPS,     Difficulty  of  MOS  appeared  to  have  no  relationship 
to  gender  for  personnel  who  had  been  assigned  special  duty  since  virtually 
identical  proportions  of  males  and  females  fell  into  the  "harder"  and 
"easier"  MOS  classifications.     For  personnel  with  no  special  duty,  however, 
the  pattern  differed  significantly,  with  more  men  concentrated  in  the 
harder  MOS  category.    As  is  apparent  from  Table  7,  almost  three-fourths  of 
the  people  with  special  duty  had  easier  MOS,     Females  with  no  special  duty 
tended  to  be  evenly  balanced  between  harder  and  easier  MOS,  while  62% 
of  men  with  no  special  duty  were  in  harder  MOS.     This  difference  (see 
Table  8)  proved  to  be  significant  (x^      5,05,  df  =  1,  £  <  ,05).     The  pattern 
shown  in  Table  7  suggests  that  supervisors  tended  to  assign  people  from 
easier  MOS  to  special  duty.     It  may  be  that  people  in  easier  MOS  are  more 
interchangeable  and  that  those  in  harder  MOS  are  difficult  to  replace  due 
to  the  physical  demands  of  the  job.     It  would  make  sense,  then,  for  a 
supervisor  to  assign  the  more  easily  replaced  soldier  to  special  duty. 
One  interpretation  of  these  findings  is  that  women  may  get  special  duty 
more  often  not  because  they  are  women  but  because  they  are  in  easier  MOS, 
(Women  may,  of  c   jrse,  be  concentrated  in  easier  MOS  to  begin  with  because 
of  gender  bias,) 

Performance,     It  was  felt  that  performance  might  be  related  to 
special  duty  with  lower  performers  being  selected  for  this  type  of  duty 
because,  like  soldiers  in  easier  MOS,   they  would  be  more  easily  replaced 


or  compensated  for.    The  results  showed,  however,  that  there  was  no 
relationship  between  performance  ratings  and  type  of  duty  for  either  those 
who  had  had  only  regular  duty  or  those  who  had  been  assigned  to  special 
duty  one  or  more  times. 

Implications 

Perhaps  the  principal  contribution  of  this  research  is  to  illustrate 
that  investigations  of  male-female  differences  should  not  examine  the 
gender  variable  in  isolation.    The  obvious  conclusion  when  significant 
differences  are  found  between  male  and  female  groups  (especially  in  the 
Amy)  is  that  bias  is  operating.    Yet  such  is  not  necessarily  the  case. 
Here,  for  example,  women  receive  significantly  more  special  duty  than 
men.    But  of  those  persons  assigned  special  duty,  there  was  no  difference 
between  males  and  females  in  type  of  duty  assigned.    The  ;-*aditional 
male  job  (guard  duty)  was  no  more  likely  to  be  assigned  to  males  than  to 
females,  and  the  same  pattern  held  for  the  traditional  female  job  (kitchen 
police  duty).    Analyses  related  to  MOS  difficulty  suggested  that  it  may 
have  been  this  variable,  rather  than  gender,  which  was  important  in  the 
selection  of  people  for  special  duty.    Special  duty  personnel  tended  to 
come  from  the  easier  MOS.    And  although  there  was  an  attempt  to  match 
males  and  females  on  the  MOS  variable,  women  may  have  been  overrepresented 
on  special  duty  because  there  was  a  significant  difference  in  the  way 
men  and  women  were  assigned  to  MOS.     In  addition  to  being  an  example  of 
the  need  to  investigate  overall  gender  differences,  this  research  has 
additional  implications  as  described  below. 

Future  research.    The  findings  described  in  this  paper  have  implications 
for  future  research  on  assignment  patterns.     If  similar  data  are  collected 
again,  every  effort  should  be  made  to  obtain  more  precise  definitions  of 
the  categories  to  be  used.     "Regular  duty,"  for  example,  should  draw  from 
the  same  set  of  behaviors  for  every  subject.    Data  collectors  should 
receive  sufficient  training  in  defining  terms  and  behaviors  and  in  recording 
data,  and  they  should  be  able  to  transmit  this  knowledge  to  the  supervisors. 
It  would  be  important  to  know  who  was  doing  what,  when  and  why  they  were 
doing  it,  and  for  how  long  and  under  what  circumstances.    The  leadership 
aspect  of  assignments  should  be  delineated  in  detail — who  makes  what 
assignments  for  whom,  and  why  these  people  were  selected.    The  Schedule  4 
performance  ratings  used  in  this  research  were  found  to  have  some  validity 
in  the  REF  WAC  research  since  two  other  REF  WAC  observational  performance 
measures  yielded  similar  findings  concerning  male-female  performance 
(Johnson  et  al.,  1978).    But  reliability  data  concerning  the  duty  categories 
are  lacking,  and  such  data  should  be  provided  for  future  research  efforts. 

In  some  of  the  data  analyses  described  in  this  paper,  comparisons  could 
not  be  made  within  each  type  of  unit  because  of  the  small  number  of  instances 
recorded.    Since  the  missions  of  different  types  of  units  vary,  it  may  not 
be  meaningful  to  combine  data  across  units.    Hence,  future  research  should 
endeavor  to  collect  enough  data  within  each  unit  type  to  permit  intra-type 
comparisons. 
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Male-female  assignments*    This  research  has  relevance  for  those 
concerned  with  patterns  of  male  and  female  assignments.    The  findings 
suggested  that  women  got  more  special  duty  during  the  extended  field 
training  exercise  not  because  they  were  women  but  because  they  were 
in  easier  MOS.    The  results  also  showed  that  men  worked  significantly 
more  hours  than  women.     If  the  proportion  of  women  in  the  Army  (or  in 
certain  Army  units)  rises  markedly,  men  may  be  increasingly  concentrated 
in  the  harder  MOS.     If,  at  the  same  time,  men  must  work  more  hours  (at 
least,  during  field  exercises),  morale  could  be  adversely  affected  and 
the  ability  to  accomplish  the  unit  mission  may  decrease.  Differential 
assignment  patterns,  then,  are  potentially  detrimental  and  may  impair 
organizational  effectiveness. 
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SUMMARY 


This  paper' has  examined  assignment  patterns  of  mala  and  female 
enlisted  personnel  during  a  field  training  exercise.    Although  female 
enlisted  personnel  averr.ged  a  greater  amount  of  special  duty  per  day 
than  did  their  male  counterparts,  men  averaged  more  regular  duty  and 
more  total  duty  per  day  than  did  women.    Women  were  found  to  be  more 
frequently  assigned  to  special  duty  than  were  men.    However,  of  those 
people  assigned  to  special  duty,  there  was  no  discrimination  In  terms  of 
the  type  of  duty  assigned  to  men  and  wciscn.    Women  wers  as  likely  to 
have  guard  duty  as  men,  and  men  were  as  likely  to  have  kitchen  police 
duty  as  women.    It  was  also  found  that  significantly  more  women  than  men 
had  easier  MOS.    Of  those  personnel  assigned  to  special  duty,  almost 
three-fourths  had  easier  MOS;  the  pattern  was  identical  for  men  and 
women.    Of  personnel  never  assigned  to  special  duty,  less  than  one-half 
were  in  easier  MOS;  a  greater  concentration  of  men  than  women  were  in  harder 
MOS.    The  findings  suggested  that  supervisors  tended  to  select  special 
duty  people  from  those  with  easier  MOS,  perhaps  because  persons  in  easier 
MOS  were  more  interchangeable  and  their  absence  was  more  easily  compensated 
for.    Since  the  matching  of  the  male  and  female  cohorts  on  MOS  was  imperfect 
it  could  not  be  determined  whether  the  differential  assignment  pattern 
was  due  to  the  overconcentration  of  women  in  easier  MOS  or  to  gender  bias. 
Unlike  MOS  difficulty,  performance  ratings  proved  to  be  unrelated  to  special 
duty.    The  conjecture  that  lower  performers  would  tend  to  be  assigned  to 
special  duty  was  not  confirmed. 

Thus,  the  primary  contribution  of  this  research  was  seen  as  an 
illustration  that  investigations  of  male-female  differences  cannot  consider 
the  gender  variable  in  isolation.    The  implications  of  the  research  were 
also  discussed  in  terms  of  methodological  considerations  for  future  research 
and  the  effect  of  differential  assignment  patterns  on  organizational  effecti 
veness . 
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Table  1 


Means  and  Standard  Deviations  for  Regular  and 
Special  Duty  Hours  for  Enlisted  Personnel 


Duty  Hours 


Type  of  Unit 


Regular  Duty 
M  SD 


Special  Duty 


Total 


Maintenance 

Male 
Female 


57 
56 


12.68 
12.59 


2.91 
3.36 


.55 
.86 


1.08 
3.79 


13.23 
13.45 


Medical 


Male 
Female 


21 
19 


9.73 
7.41 


3.82 
2.84 


.23 
.71 


.36 
.07 


9.96 
7.81 


Military  Police 

Male 
Female 


39 
42 


11.04 
9.80 


1.62 
1.95 


.29 
.16 


.56 
.34 


11.33 
9.96 


Signal 


Male 
Female 


31 
31 


12.42 
10.92 


2.24 
1.98 


.14 
.21 


,31 
,59 


12.73 
11.13 


Supply  &  Transportation 

Male  47 
Female  54 


11.52 
10.11 


2.46 
3.08 


.36 

1,45 


.89 
3.02 


11.88 
11.56 


Totals 


Male  195  11.71 

Female  202  10.60 

Entire  Sample         397  11.15 


.35 
.76 

.56 


12.06 
11.36 

11.71 
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Table  2 


Sunmary  of  Analysis  of  Variance  for  Gender,  Type  of  Unit,  and  Type  of  Du 


Source  of  Variance 


Between  auDjects 


Gender  (A) 
Unit  (B) 


Degrees  of 
Freedom 


1 
4 
4 


Mean 
Square 


38.57 
84.65 
7.22 


F  Ratio 

12.65*** 
27.76*** 
2.37  n.s. 


Within_subjects 
Duty  (C) 
CA 
CB 
CAB 


1 
1 
4 
4 


18423.17 
121.36 
69.92 
9.87 


2886.36*** 
19.01*** 
10.96*** 
1.55  n.s. 


***  £  <  .001 
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Table  3 


Frequency  of  Special  Duty  for  Each  Male 
and  Female  Soldier  by  Type  of  Unit 


Typo  nf  Unit 


Maintenance 

Male 
Female 


0  times 


41 
32 


Frequency 


1-2  times 


3  or  more  times  Total 


15 
20 


1 

4 


57 
56 


Medical 


Males 
Females 


14 
10 


7 
6 


0 

3 


21 
19 


Military  Police 

Male 
Female 


28 
32 


11 
9 


0 
1 


39 
42 


Signal 


Male 
Female 


24 
25 


7 
5 


0 
1 


31 
31 


Supply  &  Transportation 

Male 
Female 


40 
35 


5 
11 


2 
8 


47 
54 


Total 


Male 
Female 


147 
134 


45 
51 


3 
17 


195 
202 


4 .1  '•' 
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Table  4 


Frequency  of  Special  Duty  for  Males  and  Females 


Group 


Frequency  of  Special  Duty 

1-2  3  or  more  Totals 


Males 

Females 

Totals 


45(39.72;^ 

51(56.28) 

96 


3(8.27) 

17(11.72) 

20 


=  6.97  £  <.01 


48 
68 
116 


Expected  frequencies  in  parentheses 
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Table  5 


Trequency  of  Type  of  Special  Duty  Tasks  Assigned  to  Enlisted  Personnel 

Type  of  Task 


Maintenance 


Type  of  Unit  Vehicle      Guard         Kitchen  ^ 

 Maintenance      Duty  Police      Other  Total 


Males  4  19  1  24 

Females  5  31  1  37 


Medical 

Males 
Females 

Military  Police 

Males 

Females  5  4  4  13 

Signal 


Males 
Females 

Supply  &  Transportation 

Males 
Females 

Total 

Males 
Females 


6  4  10 

13  3  2  18 


6  6  0  12 


6  118 
4  3  0  7 


20  5  2  27 

36  12  3  51 


6  36  31  8  81 

5  58  53  10  126 


^This  category  includes  unexplained  tasks  and  tasks  described  generally 
as  "details." 


4  ^  c ' 
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Table  6 


Number  of  Instances  of  Guard  Duty  and  Kitchen  Police 
Duty  Assigned  to  Enlisted  Personnel 


Enlisted 
Personnel 


Guard  Duty 


Type  of  Duty 
Kitchen  Police 


Totals 


Males 

Females 

Totals 


37  (36.3)^ 
59  (59.2) 
96 


32  (32.2) 
52  (51.8) 
84 


69 
111 
180 


=  .002,  £  >.05 


^Expected  frequencies  in  parentheses 
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Table  / 


i 


Numbers  and  Proportions  of  Males  and  Females  in 
Easier  and  Harder  MOS  by  Type  of  Unit 


Type  of  Unit' 


Personnel  with  Special  Duty 


Personnel  with  No  Special  Duty 
vaeior'  Mof?  Harder  MOS 


n 


n  % 


n 


Maintenance 

Males 
Females 


12 
15 


75 
65 


4  25 
8  35 


22 
28 


48 
64 


24 
16 


52 
36 


Medical 


Males 
Females 


4 
3 


Signal 

Males 
Females 

Supply  &  Transportation 

Male  8 
Female  13 


88 
100 


58 
50 


67 
72 


1  13 
0 


3  43 
3  50 


4  33 

5  28 


11 
7 


5 
12 


15 
20 


85 
64 


12 
28 


38 
61 


2 
4 


38 
31 


24 
13 


15 
37 


88' 
72 


62 
39 


i 


Totals 


Male 
Female 


31 
40 


*MP  MOS  data  not  available. 


72 
71 


12  28 
16  29 


4'- 


53 
67 


38 
51 


88 

64 


62 

49 
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Table  8 


Percent  of  Easier  and  Harder    MOS  for  Males 
and  Females  with  No  Special  Duty 

Type  of  MOS 

Enlisted  Personnel 


Easier 

Harder 

Totals 

Males 

53(62.2)^ 

88(78.8) 

141 

Females 

67(57.8) 

64(73.2) 

131 

Totals 

120 

152 

272 

x'^  =  5.05  £  <  .05 


^Expected  frequencies  in  parentheses 


Table  9 


Total  Number  of  Males  and  Females  with  Easier  and  Harder  MOS 


Enlisted  Personnel 


Males 

Females 

Totals 


MOS 


Easier 


Harder 


Totals 


84  {Sh.ir 
107  (96.3) 
191 


100  (89.3) 
80  (90.7) 
180 


184 
187 
371 


=  5.0  2  ^.05 


^Expected  frequencies  in  parentheses 
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Table  10 


Mean  Performance  Scores  of  Enlisted  Personnel 
for  Three  Types  of  Duty  Days 


Personnel  with  special  duty 


Personnel  with  regular 
duty  only 


Type  of  Unit 


Days  with  special 
duty 


Days  with  no 
special  duty 


M 


SD  n 


M 


SD 


M 


SD 


Maintenance 


Males 
Females 


16 
2A 


6.4 
5.8 


1.00 
1.57 


16 
23 


6.1 
5.8 


,80 
,92 


41 
32 


6.0 
6.0 


1.06 
.70 


Medical 


Males 
Females 


7 
9 


5.6 
6.1 


.73 
1.05 


5.9 
6.0 


.52 
.78 


14 
10 


5.4 
5.6 


1.02 
1.23 


Military  Police 


Males 
Females 


11 
10 


5.5 
6.0 


1.77 
.48 


11 
10 


5.8 
5.9 


.93 
.73 


28 
32 


6.0 
5.9 


,76 
73 


Signal 


Males 
Females 


6.3 
6.6 


.45 
56 


5.7 
5.4 


,90 
,66 


24 
25 


5.9 
5.7 


65 
77 


Supply  & 

Transportation 


Males 
Females 


7 
19 


5.1 
5.9 


.76 
1.09 


7 
19 


5.3 
6.2 


1.22 
1.10 


40  5.8 
35  5.9 


,69 
,82 


Totals 


Males 
Females 


48 
64 


5.9 
6.0 


1.23  48 
1.21  67 


5.8 
5.9 


.92 
.95 


147 
134 


5.9 
5.9 


,86 
,81 
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THE  PREMATURE  ATTRITION  OF  NAVY  FEMALE  ENLISTEES 


Background 

Gradually,  women  are  obtaining  greater  opportunities  in  the  Navy 
defense  establishment,  as  evidenced  by  progress  in  four  areas:     (1)  it  is 
easier  for  women  to  gain  entrance  into  the  Navy  and  Navy  programs  than  it 
has  been  previously,   (2)  women  are  participating  in  a  greater  range  of 
work  activities,   (3)  a  greater  number  of  training  opportunities  are  open 
to  them,  and  (4)  there  are  signs  that  they  are  being  accepted  by  Navy 
management  as  "one  of  the  Navy's  own." 

Four  historical  developments  indicate  that  it  is  becoming  progressively 
easier  for  women  to  gain  entrance  to  the  Navy  and  its  programs.     In  1967, 
the  two  percent  ceiling  on  women  in  each  service  was  abolished  by  Congress. 
In  1972,  Admiral  E,  Zumwalt,  the  then  Chief  of  Naval  Operations,  made  women 
eligible  for  NROTC  scholarships,  thereby  giving  them  access  to  another 
officer  training  program.     In  1974,  the  age  regulations  governing  selection 
procedures  throughout  the  military  were  standardized  so  that  women  were 
no  longer  required  to  be  older  than  men.     In  1976,  entrance  of  women  into 
the  military  academies  was  cleared  by  an  amendment  to  the  Defense  Appro- 
priation Bill. 

One  development,  in  particular,  illustrates  the  expansion  of  women's 
work  roles  in  the  Navy,  i.e.,  issuance  of  "Z-Gram  116"  by  Admiral  Zumwalt 
in  1972.     This  directive  specified  that  women  were  to  become  eligible  for 
(1)  all  enlisted  ratings,  (2)  shore  command  positions,  (3)  the  Chaplain 
and  Civil  Engineer  Corps,  and  (4)  flag  rank  (i.e.,  admiral  status)  within 
managerial  and  technical  specialties. 

One  indication  that  women  are  receiving  greater  training  opportunities 
in  the  Navy  was  the  issuance  in  1976  of  a  directive  requiring  women  to 
take  apprenticeship  training  if  they  were  not  eligible  for  "A"  School. 
This  training  provides  a  basic  shipboard  orientation  and  is  a  prerequisite 
for  acquiring  an  apprentic  ship  position. 

In  addition  to  the  expansion  of  women's  work  roles  and  training  oppor- 
tunities, there  are  signs  that  women  are  gradually  being  accepted  by  Navy 
management  as  "one  of  the  Navy's  own."    For  example,  their  uniform  has  been 
redesigned  to  make  it  more  compatible  with  their  new  work  roles.  Also, 
a  "summer  whites"  uniform,  previously  available  only  to  men,  is  currently 
being  tested  for  women. 

In  summary,  it  can  be  stated  that  women  increasingly  are  becoming  a 
more  integral  part  of  the  Navy  community  and  are  shoulrlerin<>  a  greater 


The  opinions  and  assertions  contained  herein  are  those  of  the  senior 
author  and  are  not  to  be  construed  as  official  or  reflecting  the  views 
of  the  Navy  Department. 


share  of  the  defense  burden.    Moreover,  this  trend  is  expected  to  continue—  M 
the  Chief  of  Naval  Personnel  (Note  1)  has  recommended  that  the  Navy  double 
its  percentage  of  women  by  1983. 


Problem 


Two  problems  gave  rise  to  the  current  study.    First,  although  the 
attrition  rate  of  Navy  female  first  enlistees  has  been  declining  since  1973, 
it  is  still  considered  to  be  too  high-approximately  28  percent  (see  Thomas, 
Note  2).     Secondly,  it  is  expected  that  the  attrition  rate  will  increase 
when  and  if  women  are  no  longer  required  to  have  a  high  school  diploma  for 
acceptance  into  the  Navy,  i.e.,  when  and  if  selection  requirements  for  women 
become  the  same  as  those  for  men.     Research  with  men  (Plag  &  Goffman,  1966, 
Lockman  &  Gordon,  1977;  Sands,  1977)  has  consistently  demonstrated  that 
educational  level  is  the  most  valid  predictor  of  premature  attrition. 

Purpose 

The  current  study  was  an  exploratory  one,  designed  to  lay  the  ground- 
work for  (1)  an  instrument  which  could  be  used  in  the  relatively  near  future 
for  screening  female  applicants  (Goal  1)  and,  to  a  lesser  extent,   (2)  a 
screening  instrument  which  could  be  used  when  and  if  female  applicants  are 
no  longer  required  to  have  a  high  school  degree  (Goal  2).    In  order  to  reach 
these  goals,  the  study  investigated  the  relationship  between  the  premature 
attrition  of  Navy  female  first  enlistees  and  preenlisDaent  variables,  such 
as  personal  history,  demographic  traits,  and  attitudes.    It  was  believed  M 
that  these  variables  would  be  especially  useful  for  reaching  Goal  1.    That  ^ 
is.  it  was  believed  that  these  variables  would  be  less  attenuated  for  women, 
and  thus  more  predictive,  than,  for  example,  most  of  the  variables  currently 
used  to  select  males  (Lockman  &  Gordon,  1977).     The  selection  procedure  for 
males  utilizes  age,  mental  level,  educational  level,  and  number  of  depend- 
ents for  which  an  applicant  is  financially  responsible.    However,  the  last 
three  variables  are  attenuated  for  women;  mental  level  and  educational 
level  because  of  the  high  school  degree  requirement  and  the  last  because 
the  male  typically  assumes  responsibility  for  a  family  s  financial  obll- 
nations.     It  should  be  noted  that  educational  level  and  mental  level  should 
become  more  useful  as  predictor  variables  when  and  if  the  high  school 
degree  requirement  is  abolished  for  women.     It  was  believed,  however,  that 
additlona]!  variables  will  be  needed  in  order  to  effectively  predict  attrition. 
The  present  study  (Goal  2)  laid  the  groundwork  for  locating  them. 

Approach 

Instruments.    A  questionnaire  approach  was  utilized  in  the  study.  Two 
questionnaires  termed  Quest  1  and  2  were  constructed,  composed  of  common 
and  "unique"  items.     Common  items  were  those  which  were  identical  on  both 
Questionnaires.     Unique  items  were  those  which  were  found  on  one  question- 
naire but  not  the  other.     The  unique  items  on  one  questionnaire  were,  at 
times,  parallel  in  form  to  those  on  the  other  questionnaire.    At  other  times, 
unique  items  on  one  questionnaire  measured  totally  different  aspects  of  a 
general  construct,  such  as  mental  health,  than  did  items  on  the  other  | 
questionnaire.  ^ 
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Each  questionnaire  was  composed  of  120  items  which  were  conceptually 
grouped  into  eight  areas,  as  shown  in  Table  1.    Copies  of  Quest  1  and  2 
are  available  from  the  senior  author. 

Table  1 

Content  Areas  and  Number  of  Items:    Quest  1  and  2 


Content  Area  Number  of  Items 


Personal  History/Demography 

Female  Role  Ideology 

Mental  Health 

Motivation  to  Fail 

Realistic  Expectations  of  Navy 

Enlistment  Motivation 

Similarity  to  Previous  Successful  Recruits 
Occupational  Needs 


TOTAL 


Items  relating  to  personal  history  and  demographic  traits  were  con- 
ceived on  a  logical  basis,  i.e.,  they  were  perceived  as  being  related  to 
attrition.     Item  topics  included  number  of  males  in  the  household  when 
growing  up,  previous  emotional  reactions  to  time  spent  away  from  home, 
and  type  of  discipline  received  during  teenage  years. 

Items  were  included  on  female-role  ideology,  using  the  concepts  of 
"traditional"  and  "contemporary"  ideologies  advanced  by  Lipman-Blumen 
(1972) .    Approximately  80  percent  of  the  enlisted  women  in  the  Navy  are 
assigned  to  traditional  jobs.     It  was  thus  hypothes:*  =^ed  that  the  less 
traditional  a  woman  was  in  her  role  orientation,  the  more  likely  she  was 
to  attrite. 

It  was  believed  in  the  curren:.    ^Ludy  that  poor  mental  health  was  re- 
lated to  attrition  ((see,  for  example,  Craighill  (1947)  and  Schuckit  and 
Gunderson  (1971)).    Mental  health  items  developed  by  Friedman  (1956)  were 
used  in  Quest  1  and  2. 


26 
6 

24 
6 
4 

21 
6 

27 

120 
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Horner  (1969)  found  that  the  motive  to  fail  characterizes  the  personal- 
ity of  manywomen.     It  was  believed  in  the  current  study  that  this  motive 
would  lead  ultimately  to  attrition.    A  story  involving  a  hypothetical, 
successful,  woman  Recruit  Chief  Petty  Officer  was  included  in  the  question- 
naires, as  a  projective  device,  to  measure  the  motive  to  fail. 

Porter  and  Steers'  review  article  (1973)  concluded  that  one  cause  of 
turnover  may  be  the  unrealistic  expectations  of  individuals  upon  entering  an 
organization.     Items  were  thus  constructed  which  asked  the  respondent  whether 
she  had  any  relatives  or  friends  in  the  military  and  whether  she  had  dis- 
cussed their  experiences  with  them. 

Enlistment  motivation  was  also  tapped  by  the  questionnaires.     To  de- 
velop items,  50  women  were  interviewed  who  had  recently  been  assigned  to 
their  first  duty  station.     Generally  speaking,  an  "empirical  approach 
was  utilized,  i.e.,  no  hypotheses  were  advanced  relating  enlistment  motiva- 
tion to  attrition. 

Items  were  designed  to  assess  the  similarity  of  the  respondent  to  pre- 
vious successful  recruits.    A  woman  who  had  previously  been  a  recruiu 
company  commander  was  interviewed  and  "success"  traits  identified,  such 
as  (1)  a  tendency  toward  conformity,   (2)  a  commitment  to  the  Navy  as  a 
career,  and  (3)  a  deliberative  rather  than  an  impulsive  decision-making 
style. 

It  was  believed  in  the  current  study  that  a  general  set  of  occupational 
needs  may  exist  that  are  optimally  compatible  with  Navy  life  Individuals 
possessing  these  needs  would  be  more  likely  to  experience  job  satisfaction 
and  perhaps  less  likely  to  at  trite.     Items  were  adapted  for  use  from  Hall  s 
Occupational  Orientation  Inventory  (1971) . 

SAmnle  And  Data  Collection.     One  of  the  two  questionnaires  was  admin- 
is  te^id^^"lIdrfiiiiiir7^^7^Zr  after  she  had  been  assigned  to  her  company, 
but  before  her  actual  training  had  begun.      Twenty  companies  participated, 
with  a  total  N  of  977.     Questionnaires  were  administered  in  May,  June, 
and  July  of  1975  and  a  deadline  of  December  1976  established  for  deter- 
mining whether  a  woman  was  an  attritee  or  a  survivor. 

Data  Analysis.    Analyses  were  conducted  for  individual  items  (referred 
to  below  as  the  "first"  and  "second"  analyses)  and  scales  (referred  to  as 
"third"  and  "fourth"  analyses.)     In  the  first  analysis,  chi-square  and 


^Since  the  study  was  an  exploratory  one,  the  "restriction  of  range" 
problem  inherent  in  using  a  recruit  sample  was  not  deemed  critical. 

^The  use  of  one  deadline  for  all  subjects,  instead  of,  for  example, 
using  an  18-month  lag  period  for  each  person,  was  not  viewed  as  serious 
because  of  the  study's  exploratory .nature. 
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"strength  of  association"  (SA)  statistics  were  computed  between  each  item 
and  attrition.     The  Cramer  V  was  utilized  as  a  measure  of  SA  for  nomiiiaiiy- 
scaled  items,  while  tau      and  £  were  used  for  ordinally-scaled  items. 
Sample  N's  for  the  unique  Quest  1,  unique  Quest  2,  and  common  items  were 
485  (105  attritees  and  380  survivors)  (Sample  A),  492  (99  attritees  and 
393  survivors)  (Sample  B),  and  977  (204  attritees  and  773  survivors) 
(Sample  C) ,  respectively.    As  reported  later,  more  unique  Quest  1  items 
emerged  as  significant  in  the  chi-square  analysis  than  did  for  the  other 
types  of  items.     Therefore,  in  the  second  analysis.  Sample  A  (1)  was  divided 
randomly  in  two,  (2)  ordinally-scaled  items  were  identified  in  the  first 
subsample  which  evidenced  a  tau  of  .10  or  greater,  and  (3)  a  regression 
analysis  was  conducted  with  the  second  subsample  utilizing  (a)  the  identified 
items  as  predictors  and  (b)  the  attritee — survivor  status  of  the  woman 
as  the  criterion.    A  shrunken  R  was  then  computed. 

In  the  third  analysis,  the  empirical  keying  approach  of  Campbell  (1971) 
was  utilized  to  identify  a  set  of  discriminating  response  options  or 
"scale",  i.e.,  a  set  of  response  options,  each  of  which  had  been  selected 
by  at  least  10  percent  more  attritees  than  survivors,  or  vice  versa. 
This  approach  was  utilized  separately  for  Samples  A,  B,  and  C.  The 
fourth  analysis  also  utilized  Campbell's  approach.     Samples  A,  B,  and 
C  were  each  randomly  divided  into  a  validation  and  cross-validation  sample. 
A  unit  weighting  scoring  system  was  devised  based  on  validation  sample 
data.     That  is,  a  plus  or  a  minus  1  was  assigned  to  a  discriminating  response 
option,  as  appropriate,  and  a  zero  to  non-discriminating  options.  This 
system  was  then  utilized  with  the  cross-validation  sample  to  produce  a 
scale  score  for  each  woman.    Scores  for  the  entire  cross-validation 
sample  were   then   correlated  with  attrition. 
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RESULTS 


The  Relationship  Between  Items  and  Attrition 

Table  2  presents  the  results  for  all  the  unique  Quest  1  Items  which 
were  significantly  related  to  attrition.    These  Items  are  broken  down 
into  those  which  were  ordlnally-scaled  and  those  which  were  nominally- 
scaled.    For  ordlnally-scaled  Items,  the  exact  nature  of  their  relationship 
with  attrition  is  specified—for  example,  the  table  indicates  (see  Item  41) 
that  a  woman  is  more  likely  to  attrite  if  she  values  individuality  as 
opposed  to  conformity.    For  the  nominally-scaled  items,  only  the  general 
content  of  the  inem  is  supplied. 

Twenty-one  of  the  57  items  unique  to  Quest  1  demonstrated  a  statistical- 
ly significant  relationship  with  attrition  in  the  chl-square  analysis 
(p  <  .10).^    However,  the  absolute  strength  of  these  relationships  was 
weak,  obtained  statistics  varying  from  .008  (item  92,  tau  c)  to  .215 
(item  52,  tau  b).    Of  the  eight  areas  advanced  at  the  start  of  the  study 
as  possible  indicators  of  attrition,  only  two—mental  health  and  occupa- 
tional needs— produced  a  sizeable  number  of  significant  items  in  the 
chl-square  analysis.    The  relationship  between  the  mental  health  items 
and  attrition  was  as  hypothesized,  i.e.,  the  more  the  woman  perceived 
herself  as  nervous,  headache-prone,  etc.,  the  more  likely  she  was  to 
attrite.    Although  no  hypotheses  were  advanced  for  the  occupational  need 
items,  a  discernible  need  profile  emerged  for  the  attrltee. 

Table  3  presents  the  results  for  those  items  unique  to  Quest  2  which 
were  significantly  related  to  attrition,  while  Table  4  presents  the  re- 
sults for  significant  common  items.     Eight  of  the  57  unique  Quest  2  Items 
and  nine  of  the  63  common  items  were  significantly  related  to  attrition, 
although,  once  again,  the  actual  strength  of  these  relationships  was  weak 
Most  of  the  significant  unique  Quest  2  Items  represented  the  mental  health 
area,  while  the  significant  common  items  represented  the  personal  history 
and  enlistment  motivation  areas. 

Even  though  strength  of  association  statistics  were  generally  low  for 
the  significant  items,  the  possibility  existed  that  combining  the  Items 
in  a  multivariate  fashion  would  Increase  their  predictive  value.    An  ex- 
ploratory analysis  was  conducted,  therefore,  composed  of  the  f°ll°wing 
steps:     (1)  Individuals  in  Sample  A  (i.e.,  individuals  completing  Quest  1) 
were  randomly  assigned  in  a  50-50  fashion  to  one  of  two  subsamples,  (2) 
ordlnally-scaled  items  were  identified  in  subsample  1  which  evidenced  a 
tau  statistic  >  .10,  and  (3)  these  items  were  then  utilized  as  predictors 
with  subsample"2  in  a  multiple  regression  analysis  in  which  the  survlvor- 
attritee  status  of  the  woman  served  as  the  criterion.    Eighteen  items 
were  subsequently  identified  for  subsample  1.    Results  are  available  from 

^It  was  judged  that  this  significance  level  was  appropriate  for  an 
exploratory  study. 
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Tablp  ? 

l/niquo  Quest  1  Items  l^hich  Were  Significantly  Keialed  to  Attrition 


Ordinally-Scaled  Items 

-   

I  tern 
Number 

Category 

2 

X  (df) 

Strength  of^^ 
Assocla  tion 

Nature  of  Relationship:  V^oman 
More  Likelv  to  Attrite  If  She 

41 

SR 

11.109(3) 

.011 

-.067 

Values  "f  nH  1  vf  Hna  1  "f  f" V 

46 

IIH 

13.903(1) 

.001 

.174 

RpflCtQ    r>nnT"1v    t' r\  rtT*occtiT*o 

47 

MH 

5.261(1) 

.022 

.109 

Is  nervous  person 

52 

HH 

21.087(1) 

.001 

.215 

LjA^cl  xcll^crs    ITiclliy  IlcdUclUIIco 

56 

MH 

3.637(1) 

.057 

.097 

60 

m 

5.140(1) 

.023 

.109 

62 

IIH 

19.687(1) 

.001 

.208 

Tremhl_p«;  u-f  t*Vi  anvfpt'v 

63 

MH 

5.366(1) 

.021 

-.114 

Hfld  nnnr  r»}i i  1  HHnnrl  Hpa^^H 

64 

MH 

7.579(1) 

.006 

.131 

Has  difficulty  sleeping 

66 

MH 

12.809(1) 

.001 

.168 

Worries  a  lot 

71 

ON 

12.908(4) 

m  9 

.  U  J.  ^ 

Dp  *?ir<5*?    ^^Il^nnn  mv 
lycoxLCO  dUL'JIlUlIiy 

73 

ON 

8.981(4) 

.062 

.109 

Doesn't  value  novelty 

76 

ON 

16.488(4) 

.002 

.106 

Doesn't  value  work  teams 

77 

ON 

9.185(4) 

.057 

.064 

Doesn't  value  caring  supervisor 

78 

ON 

8.698(4) 

.069 

.083 

Doesn't  value  job  respect 

79 

ON 

14.376(4) 

.006 

.083 

Doesn't  value  orderly  procedures 

89 

ON 

8.355(4) 

.079 

.031 

Doesn't  value  set  plans 

92 

ON 

8.083(4) 

.089 

.008 

Doesn't  value  interpersonally- 
oriented  jobs 

94 

ON 

13.060(4) 

.011 

-.089 

Enioys  working  outdoors 

iSominally-Fcaled  Items 


Item  ^  2  J-crength  of. 

Number        Category  X  (df)  £         Asftr.-r  iri  t  ion  Item  Content 

34  RE  19.630(4)        .001  .202  Group  activities  with  males 

35  FR  10.492(4)        .033  .150  Family  rel igion 

Note.     N  =  485. 

category  abbreviations:     RE  =  items  on  realisti::  expectations  about  Mavy,  FP  =  items  on 
female  role  ideology,  SR  =  items  on  similar iry  to  previous  successful  recruits,  MH  =  mental 
hGcilth  items,  ON  =  occupational  need  Itc-ms. 

i^Fo     ^rdinally-scaled  items,  tau  b  v;as  conniited  for  the  2x2  situation,   i.e.,  when  the  df 
fwere   )♦  and  l  iu      was  computed  in  nil  other  situations,   l.n.»  whnn  thf  df  were  preatnr  than 
I.     For  nominti  I  1  v-pra  1  <?d   ir<:^ns»  a  Cr.imer  V  was  computed. 

427 


Table  3 

Unique  Quest  2  Items  .Thlch  Were  Significantly  Related  to  Attrition 


Item 
Number 

Category* 

X^(d£) 

£ 

Strength  oi|^ 
Association 

Nature  ox  i\exauionsnip.  woiuaii 
More  Likely  to  Attrite  If  She 

34 

m 

10.694(4) 

.030 

-.054 

Had  a  neglectful  mother 

47 

NH 

5.619(1) 

.018 

.112 

Is  nervous 

49 

NH 

6.445(1) 

.011 

.121 

Had  severe  childhood  punishment 

52 

MH 

5.252(1) 

.022 

.110 

Faints  a  lot 

54 

MH 

6.574(1) 

.010 

.122 

Believes  she  has  bad  luck 

60 

MH 

9.112(2) 

.011 

.099 

Is  chronically  tired 

66 

m 

6.753(1) 

.009 

.123 

Becomes  upset  when  yelled  at 

93 

ON 

10.558(4) 

.032 

-.002 

Wants  to  travel 

Note.    N  -   492.    All  items  were  ordinally  scaled. 

^Category  abbreviations:    MH  -  mental  health  items,  ON  -  occupational  items. 

^tau  b  was  computed  for  the  2  x  2  situation,  i.e.,  when  the  df  were  1,  and  tau  c  was  com- 
puted for  all  other  situations,  i.e.,  when  the  df  were  greater  than  1. 
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Table  A 


Common  Items  Which  Were  Significantly  Related  to  Attrition 


Ordinally-Scaled  Items 


Item  ^  2  Strength  of^        Nature  of  Relationship:  Woman 

Number         Category  X  (df)  £         Association         More  Likely  to  Attrite  If  She 


19  PH  9.688(4)  .046  .092  Dates  infrequently 

21  PH  15.578(4)  .004  .088  Plans  to  marry/remarry 

111  EM  9.759(4)  .048  -.055  Doesn't  want  to  travel/meet 

people 

113  EM  8.003(4)  .092  -.028  Doesn't  want  further  education 

115  EM  14.238(4)  .007  .096  Has  relatives/friends  in  service 

117  EM  18.138(4)  .001  .110  Wants  to  help  family  financially 


Nominally-'Scaled  Items 

Item  2  Strength  of^ 

Number^       Category  X  (df)  £       Association  Item  Content 


14  PH  12.240(4)        .016         .124  Childhood  clubs 

17  PH  15.038(4)        .005  .124  Types  of  male  friendships 

20  PH  16.727(4)        .002         .131  Marital  history 


Note.    N  -  977. 

^Item  numbers  were  the  same  for  both  questionnaires. 

^Category  abbreviations:    PH  =  personal  history  items,  EM  =  enlistment  motivation. 

^All  values  for  the  ordinally-scaled  items  are  tau  £  statistics .  Cramer  V's  are  entered 
for  the  nominally-scaled  items. 

^This   item  varied  somewhat   in  content   from  Item  93   in  Table  3, 
perhaps   accounting  for  the   different  results. 
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the  senior  r.uthor.  A  multiple  R  of  .388  was  obtained  with  su^sample  2 
and  a  shrunken  R  of  .295  (Guilford  &  Fruchter,  1978,  p.  377). 


The  Relationship  Between  Scales  and  Attrition 

As  described  previously,  response  options  were  identified  for  Samples 
A,  B,  and  C  which  discriminate  between  attritees  and  survivors.    In  any 
future  studies,  these  options  are  likely  to  be  the  most  stable  since  the 
entire  sample  waa  utilized  in  each  case.    Information  on  these  options 
is  available  from  the  senior  author. 

To  claim  that  these  options  are  the  most  stable  is  not  sufficient, 
however:    They  may  not  be  stable  enough  to  base  a  screening  instrument 
on  them.    Therefore,  an  attempt  was  made  to  obtain  some  quantitative  infor- 
mation.   As  described  previously.  Samples  A,  B,  and  C  were  each  divided 
into  a  validation  (V)  and  cross-validation  (CV)  sample.    A  set  of  dis- 
criminatitig  response  options,  or  scale,  was  identified  for  each  V  sample, 
the  three  scales  respectively  termed  (1)  the  Unique  Quest  1  Scale^  (2) 
the  Unique  Quest  2  Scale,  and  (3)  the  Common  Scale.    Options  identified  in 
the  V  sample  were  then  unit  weighted  in  the  CV  sample  and  a  scale  score 
computed  for  each  woman.    Correlating  such  scores  with  attrition  yielded 
cross-validation  coefficients  of   .  24  7  ,  .  149  ,and  .069  for  the  above-mentioned 
Unique  Quest  1,  Unique  Quest  2,  and  Common  Scales,  respectively. 


^Although  restricted  samples  were  utilized,  one  would  generally  expect 
the  reported  coi relations  to  be  underestimates. 
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CONCLUSIONS  AND  RECOMMENDATIONS 


As  reported,  a  shrunken  R  of  .30  was  obt.-^ined  between  attrition  and  a 
set  of  items  measuring  preenlistment  variables.    Moreover,  a  cross-valida- 
tion correlation  of  .25  was  obtained  between  vLtrition  and  one  of  the  scales 
created  through  empirical  keying.    Both  resului,  suggest  that  a  moderately 
effective  instrument  can  be  constructed  for  screeaing  current  female  appli- 
cants, although  the  regression  approach  seems  more  promising. 

If  one  examines  Tables  2,  3,  and  4,  one  sees  that  there  are  25  items 
with  "strength  of  association"  values  >^  .10.    At  a  time  when  female  appli- 
cants are  required  to  have  a  high  school  education,  these  items  represent 
the  most  logical  choices  for  a  selection  instrument.    It  is  recommended, 
however,  that  researchers  first  evaluate  these  items  from  a  strict  legal 
and  practical  standpoint.    Some  items — for  example,  those  on  occupational 
needs — pose  no  obvious  problem.    However,  an  item  on  family  religion  could 
net  ethically  or  legally  be  used.     There  may  be  some  question  about  using 
the  mental  health  items,  because  they  may  represent  an  invasion  of  privacy. 
Also,  it  may  be  possible  to  "fake"  one's  responses  to  the  mental  health 
items.    That  is,  if  applicants  are  informed,  as  require '^  by  the  Privacy 
Act,  that  these  items  are  being  used  to  screen  them,  they  may  falsify 
their  answers . 

Items  which  survive  this  evaluation  should  then  be  administered  to 
female  applicants  at  the  Armed  Forces  Entrance  and  Examination  Centers, 
along  with  the  Armed  Services  Vocational  Aptitude  Battery  (ASVAB).     It  is 
believed  by  the  authors  that  the  Navy  is  committed  to  using  the  ASVAB  as 
its  primary  screenir..:  Ivr/ice  for  the  foreseeable  future.    The  goal  in  the 
proposed  study,  the*:'^.:.^£e,  becomes  one  of  determining  whether  "preenlistment" 
questionnaire  items  sign ' /"icantly  improve  attrition  prediction  over  and 
above  the  ASVAB.     (It  was  impossible  in  the  current  study  to  include  the 
ASVAB  as  a  variable,  since  the  Ea^iic  Test  Battery  was  the  selection  instru- 
ment used  by  the  Navy  when  the  study  was  conducted.) 

The  current  study  has  additional  implications  for  developing  a  screening 
instrument  for  use  when  and  if  women  no  longer  are  required  to  have  a  high 
school  degree  for  acceptance  into  the  Navy.     That  is,   ^:he  study  suggests 
that  the  most  valuable  items  for  predicting  attrition  will  probably  comc: 
from  the  mental  health,  occupational  need,  enlistment  motivation,  and 
personal  history  areas  (see  Tables  2,  3,  and  4). 
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Leader  Sex,  Leader  Descriptions  of 
Own  Behavior,  and  Subordinates 
Description  of  Leader  Behavior 


ABSTRACT 


In  this  paper  the  authors  examine  the  relationship  between 
male  and  female  leaders  description  of  their  own  behavior 
and  the  followers  description  of  the  leader's  behavior  in 
traditionally  male-oriented  leadership  positions. 

The  data  were  collected  as  part  of  a  larger  research  effort 
to  assess  how  women  are  being  assimilated  into  the  Corps  of 
Cadets  at  West  Point,  and  how  the  women  are  being  trained  to 
become  effective  Army  leaders. 

During  the  summer  of  1978,  women  cadets  in  the  graduating 
class  of  1980  were  assigned  for  the  first  time  into  non- 
traditional  platoon  leader  roles  in  predominantly  la&Le  sub- 
ordinate units.     Both  male  and  female  platoon  lenders  were 
asked  to  describe  their  behavior  using  the  Leadeiohip 
Opinion  Questionnaire  (Fleishman,  1960).     Two  composite  scores 
Consideration  and  Structure  were  the  dimensions  of  leadership 
behavior.     Subordinates  in  the  platoons  were  asked  to  describe 
their  leader's  behavior  on  the  same  two  dimensions,  Consider- 
ation and  Structure. 

The  results  were  interpreted  in  terms  of  three  major  issues: 

(1)  the  importance  of  sex  roles  as  a  leadership  variable; 

(2)  the  leader  perceptions  of  what  performance  behaviors  are 
more  important.  Consideration  versus  Structure,  and  (3)  the 
subordinates  perceptions  of  what  performance  behaviors  are 
important  in  a  platoon  leader's  role. 
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INTRODUCTION 


The  concern  about  how  well  women  can  perform  in  non-tradi- 
tional leadership  roles  has  been  a  salient  issue  in  the 
military  particularly  with  the  admission  of  women  as  cadets 
in  the  service  academies.    As  military  planners  and  research- 
ers began  to  prepare  programs  for  the  development  of  women  as 
future  Army  leaders,  little  empirical  research  was  available 
in  academic  resources  from  which  they  could  draw.  Stogdill 
completed  a  comprehensive  re^/iew  of  leadership  research  in 
1974;  however,  sex  roles  and  leadership  were  not  systemat- 
ically addressed.    Terborg  (1977)  prepared  a  review  of  the 
literature  on  women  in  management  roles.     Some  studies  prior 
to  1975  suggest  that  there  appears  to  be  a  bias  in  psychology 
for  researchers  to  study  males  rather  than  females  or  both 
sexes  (see  Holmes  and  Jorgensen  1971;  Dan  and  Beekman,  1972). 
Thus,  military  researchers  and  decision  makers  need  to  be 
cautioned  about  the  generalizability  of  conclusions  drawn 
from  male-based  research.    Bender  (1978)  suggests  that  it  re- 
mains unclear  if  social  psychological  literature  on  leader- 
ship is  applicable  for  women  as  leaders. 

This  paper  reports  the  results  of  a  portion  of  a  longitudinal 
research  program  to  assess  how  women  are  being  assimilated 
into  the  Corps  of  Cadets  at  West  Point,  and  how  effective  the 
women  are  being  trained  to  become  effective  Army  officers. 


RATIONALE  OF  THE  STUDY 

On  October  7,  1975  President  Ford  signed  into  law  Public  Law 
94-106,  an  amendment  to  which  authorized  women's  admissions 
to  the  service  academies,  including  West  Point*    As  a  result 
the  academy  developed  operational  plans  for  the  admission  of 
women  as  cadets. 

Four  phases  of  the  prograiii,  I'^trr  titled  Project  Athena,  were 
planned: 

-  Preadmission  phase  to  prepare  cadets  and  the  military 
community  for  the  arrival  of  women  (Vitters  and  Kinzer 
1978). 

-  Integration  phase  which  included  careful  doctraientation  of 
how  women  were  being  integrated  into  the  Cc  rps  of  Cadets 
(see  Vitters  and  Kinzer  1977,  and  Vitters,  1978). 
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-  The  Assimilation  phase  ^hich  studies  how  well  women  are 
being  fully  assimilated  into  the  Corps  of  Cadets. 

-  The  Graduate  Assessment  phase  which  will  study  how  well 
women  are  performing  their  roles  as  officers. 

The  first-  two  phases  of  Project  Athena  have  been  completed 
The  lattt;r  two  are  continuing  to  be  designed  and  studied. 


DESIGN 


The  design  of  this  study  involved  five  cadet  companies  where 
women  were  assigned  into  non- traditional  roles  as  platoon 
leaders  for  the  first  time.     The  platoon  leadership  positions 
were  for  a  four  week  interval  after  which  a  leadership  change 
would  occur.    Women  platoon  leaders  were  assigned  to  both 
the  first  and  second  changeover  detail. 

At  the  end  of  the  summer  training,  all  platoon  leaders  were 
asked  to  describe  their  leadership  behavior  using  Fleishman's 
Leadership  Opinion  Questionnaire.     At  a  separate  location, 
the  subordinates  were  assembled  to  prepare  peer  ratings 
During  this  time    the  subordinates  were  also  asked  to  describe 
the  behavior  of  the  platoon  leaders  of  each  detail  using  the 
same  dimensions  of  Consideration  and  Structure. 

Because  there  were  only  five  women  assigned  in  the  non-tradi- 
txonal  role  as  platoon  leaders,  a  matched  pair  of  five  men 
from  the  same  units  on  alternate  details  was  used  (see  slide 
I).     Thus,  the  subordinates  rated  both  the  male  and  the  female- 
leader  of  the  same  platoon.     The  independent  variables  tested 
m  the  design  were: 

Cadet  Companies* 

Details  within  Companies  (nested) 
Platoon  Leader  Sex' 

The  dependent  measures  used  were: 

Scores  on  the  dimension  of  Consideration  • 
(Welfare  of  subordinates) 
Scores  on  the  dimension  of  Structure 
(Ability  to  get  the  task  done) 

^  The  coippany  designations  1  thru  5  are  arbitrary  to  protect  the 
anonymity  of  the  male  and  female  leader  participants. 
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FINDINGS : 


In  terms  of  differences  between  how  male  and  female  leaders 
describe  their  own  behavior,  there  were  no  significant  dif- 
ferences. That  is,  there  was  no  significant  difference  be- 
tween male  and  female  platoon  leaders  in  how  they  described 
themselves  on  the  dimensions  of  consideration  or  structure. 
The  authors  conclude  that  the  sample  of  only  ten  leaders 
was  too  small  to  note  any  sensitive  differences  between 
leaders  on  either  of  the  criterion  dimensions. 

In  the  analyses  where  the  subordinates  described  the  leader- 
ship behavior  of  their  leaders,  statistically  significant 
effects  were  noted-    When  the  subordinates  used  Consideration 
as  the  dependent  variable  a  leader  sex  main  effect  was  noted 
(see  slide  2) .    The  slide  shows  that  the  platoon  members 
perceived  different  behaviors  on  the  part  of  male  and  female 
leaders  with  regard  to  the  leader's  concern  for  the  welfare 
of  the  members- 

However,  because  the  significance  tests  do  not  provide  any 
information  about  the  pattern  of  effects,  a  multiple  class- 
ification analysis  was  conducted  to  determine  which  sex 
provided  more  concern  for  subordinates  (Consideration).  The 
results  of  this  analysis  are  presented  in  slide  3.    The  de- 
viation from  eta  indicated  in  the  LEADERSEX  variables  re- 
veals that  it  is  the  females  who  are  the  leaders  whom  sub- 
ordinates believe  as  having  more  concern  for  the  welfare  of 
the  troops. 

In  the  analyses  where  subordinates  were  asked  to  describe 
the  leader  behavior  of  their  platoon  leaders  on  Structure 
(Task  Accomplishment)  there  were  no  main  effects  due  to 
LEADERSEX.     It  is  the  authors'  belief  that  the  subordinates 
described  their  platoon  leaders  as  equally  capable  of  getting 
the  task  or  mission  accomplished.    The  multiple  classification 
analysis  revealed  :io  significant  difference  between  LEADERSEX 
for  the  Structure  dimension  (e.g.,  deviation  eta  for  males 
-0.41  and  0.43  for  females). 


DISCUSSION: 

The  results  reported  in  this  study  are  part  of  a  larger 
program  which  is  trying  to  assess  how  well  women  are 
assimilating  into  the  Corps  of  Cadets.    Part  of  the 
assessment  of  full  assimilation  requires        to  examine 


how  well  women  are  objectively  performing  in  new,  non- 
traditional  roles  as  li^aders  and  what  the  perceptions 
are  about  the  women  leaders'  performance, 

xae  data  in  this  study  indicates  that  the  leaders  them- 
selves do  not  report  any  difference  in  how  they  see  their 
platoon  leader  roles.     This  may  xvell  be  an  artifact  in  the 
methodology  of  too  small  a  sample        10  leaders. 

The  more  promising  results  indicate  that  subordinates  do 
see  male  and  female  leader  differences.     Women  are  re- 
ported to  be  more  sensitive  to  the  welfare  of  subordinates 
Perhaps  one  may  associate  a  priori  the  feminine  communal 
values:     sympathy,  sensitivity  and  consideration  as  be- 
haviors one  may  expect  to  typically  find  in  women  leaders 
(see  Spence  and  Helm.reich,  1974)  ,     It  is  important  to  note 
that  these  behaviors  are  important  for  a  leader  espe- 
cially one  who  will  be  expected  to  lead  in  an  Army  that 
requires  the  integrated  services  of  both  men  and  v/omen. 

Subordinates  also  reported  no  difference  in  leader  be- 
haviors between  male  and  female  platoon  leaders  in  their 
activities  to  accomplish  the  mission  (Structure),  In 
this  study,   the  authors  are  encouraged  to  find  no  statis- 
tically significant  differences  due  to  sex.     Should  men 
have  higher  subordinate  scores  on  this  dimension,  one 
could  possibly  infer  that  there  men  were  more  inclined 
to  get  the  job  done  than  women. 

The  issues  and  concerns  of  how  women  are  performing  in  new 
non-traditional  roles  will  continue  to  be  studied.  Objec- 
tive performance  measures  of  how  vjell  women  have  performed 
in  these  roles  is  still  being  analyzed.     Finally,  compar- 
isons of  male  superiors  attitudes  towrrd  women  in  the  Army 
and  the  superior's  evaluations  of  men  and  women's  perform- 
ance is  also  being  analyzed  to  see  if  any  sex  bias  in 
evaluation  of  women  leaders  is  unique  to  those  male  supe- 
riors with  traditional  beliefs. 
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Cadet  Company 


Leader  Sex 
Female 

Male 

Female 

Male 

Female 

Male 

Male 

Female 

Male 

Female 


Training  Detail 

First  Four  Weeks 
Training 

Second  Four  Weeks 
Training 

First  Four  Weeks 
Training 

Second  Four  Weeks 
Training 

First  Four  Weeks 
Training 

Second  Four  Weeks 
Training 

First  Four  Weeks 
Training 

Second  Four  Weeks 
Training 

First  Four  Weeks 
Training 

Second  Four  weeks 
Training 


*  A  sixth  company  was  originally  planned  in  the  design 
however,  the  female  who  was  designated  to  be  the  platoon 
leader  volxmtarily  resigned  and  the  orthogonal  block  of 
3  women  first  detail  3  women  second  detail  was  lost. 


SLIDE  1 


Independent  Variables: 


Company 

Training  Detail 
Leader  Sex 
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*HIERARCHIAL  ANOVA:     CRITERION  (CONSIDERATION) 


SOURCE 


MEAN  SQUARE 


F 


SIGNIFICANCE  OF 


MAIN  EFFECTS 
LEADERSEX 
DETAIL 
COMPANY 


170.30 
786.97 
8.43 
56.61 


2.46 
11.36 
0.12 
0.82 


.025 
.001 
.999 
.999 


2  WAY  INTER- 
ACTIONS 

LEADERSEX 

COMPANY 


151.96 


2.19 


0.088 


EXPLAINED 
RESIDUAL 


164.19 
69.27 


2.37 


0.014 


SLIDE  2    Leader  Sex  Vain  Effect  for  Subordinates 
description  of  leader  behavior  of 
Consideration 


*  Hierarchial  approach  (option  10)  invokes  the  stepdown 
procedure.     The  sum  of  sauares  assoc     -ed  with  the  main 
effect  for  the  first  variable  is  no^        usted  for  any 
other  variables.     The  sum  of  squares  lor  the  main  effect 
for  the  second  variable  considered  is  adjusted  only  for 
the  first  variable,  and  so  on  (See  Nie  et.al.,  1970) 
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MULTIPLE  CLASSIFICATION  ANALYSIS 


ADJUSTED  FOR 

VARIABLE  &  CATEGORY       UNADJUSTED  DEV'N  ETA  INDEPTNDENT 

VARIABLES  DE^ 
ETA 


LEADERSEX 

1  MALE                              -1.70  -1.71 

2  FEMALE                            1.80  1.80 

0.21  0.21 

DETAIL                                               0.03  0.02 

COMPANY                                             0.11  0.11 


SLIDE  3    Multiple  Classification  Analyses 
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INTRODUCTION 

As  current  Air  Force  policy  has  opened  many  traditionally  all-male 
special tifrs  to  women,  it  has  become  increasingly  important  to  the  Air 
Force  to  hi:\e  detailed  managament  information  concerning  how  females  are 
actually  be.ing  utilized  on  the  job.    The  Occupation  and  Manpower  Research 
Division  of  the  Air  Force  Human  Resources  Laboratory  is  devising  methods 
of  providing  sufficient  information  to  address  present  management 
questions  regarding  female  uti'iization  and  identifying  problems  which, 
thus  far,  may  not  have  received  management's  attention. 

This  report  presents  results  of  a  probe  study  of  female  aircraft 
mechanics  and  outlines  what  we  hope  to  accomplish  in  our  follow-'On  in- 
depth  analyses  of  the  area.    The  probe  study  involved  analysis  of  on- 
hand  data  collected  during  a  routine  occupational  survey  conducted  by 
the  Air  Force  Occupational  Measurement  Center  (AFQMC). 

METHOD 

Survey  Instruirient 

The  job  inventory  used  in  the  survey  was  very  comprehensive,  con- 
sisting of  977  task  statements  organized  under  23  duties.    It  was  admini- 
stered between  May  and  September  1976,  and  usable  data  were  received 
from  5825  males  and  206  females. 

Analysis  Sample 

All  of  the  women  in  our  survey  sample  had  been  in  the  Air  Force  43 
or  fewer  months.    This  meant  we  could  not  simply  compare  iTiale  and  female 
first-termers  without  further  sample  selection  and  matching.  Further- 
more, if  we  had  restricted  our  samples  to  those  who  had  been  on  board 
between  8  and  43  months,  there  would  have  bean  a  3-month  difference  in 
the  average  Total  Active  Federal  Military  Service  (TAFMS)  for  males  and 
females.    In  order  to  do  direct  comparative  analyses,  we  matched  the  feo 
samples  on  TAFMS  case  by  case. 

The  final  sample  analyzed  in  our  probe  study  is  shown  in  Table  1. 
Our  intent  was  to  select  males  for  a  perfec':  month-by-month  match  with 


females  on  a  five-to-one  basis.  We  later  had  to  discard  one  female  for 
missing  data,  but  this  did  not  have  a  significant  impact  on  the  equiva- 
lence of  the  matched  sample. 

Data  Quality  Control  Checks 

In  reviewing  the  survey  returns,  we  noticed  a  number  of  individuals 
claiming  to  be  females  who  had  distinctively  male  names.    We  therefore 
matched  the  entire  sample  against  the  Air  Force  Uniform  Airman  Record 
file  to  eliminate  possible  errors  in  sex  identification.    We  discovered 
that  approximately  one-half  of  one  percent  of  the  individuals  surveyed 
made  an  error  in  identifying  their  own  sex.    This  is  not  a  very  high  error 
ro;o"°?^^  ^^^''^       hundred  subjects.    However,  in  a  sample  of 

6000  males  and  200  females,  this  leads  to  an  intolerable  error  in  iden- 
tification of  the  female  subsample.  In  such  an  instance,  29  (or  12  7%) 
of  the  229  identified  as  females  would  actually  be  males. 

RESULTS 

Job-Type  Analysis 

A  very  large  number  of  job  types  and  job- type  clusters  were  identi- 
fied using  the  comprehensive  Occupational  Data  Analysis  Programs  (CODAP) 
system.    However,  for  simplicity,  all  of  these  could  be  classified  as 
being  either  hard-core  maintenance  jobs  or  support  jobs.  Representative 
job  types  in  these  two  categories  are  shown  in  Table  2.    The  title  "Crew 
Chief   IS  somewhat  of  a  misnomer.    Crew  chiefs  are  not  supervisors; 
they  are  the  flight  line  mechanics  who  perform  primary  aircraft  mainte- 
nance tasks.    Note  that  59%  of  the  males  and  44%  of  the  females  in  our 
sample  were  classified  as  crew  chiefs.    Some  differences  in  job  assign- 
ment as  a  function  of  sex  is  apparent  from  data  in  this  table.  Only 
5.5%  of  the  males  were  working  in  support  jobs,  while  26.2%  of  the 
females  were  working  in  jobs  classified  in  this  category. 

Information  in  Table  3  suggests  that  during  the  first  43  months, 
there  is  a  movement  of  individuals  from  maintenance  to  support  jobs 
However,  this  flow  appears  to  be  much  larger  fcr  females  than  males.* 
mis  implied  difference  is  shown  graphically  in  Figure  1. 

It  can  be  seen  in  Table  4  that  women  in  maintenance  jobs  find  their 
work  at  least  as  interesting  as  men  and  express  at  least  equal  intent  to 
reenlist.    They  report  their  talents  as  being  slightly  less  well  utilized. 

Data  in  Table  5  suggest  that  tiere  are  differences  in  the  work  performed 
by  tnen  and  women  working  in  support  job  types.    Women  spend  more  of 
their  time  performing  tasks  which  can  be  classified  as  administrative  or 
clerical  in  nature.    However,  ^ere  appears  to  be  little  difference  in 
the  nature  of  tasks  performed    y  males  and  females  working  in  mainte- 
nance jobs.    Notice  the  small    ex  differences  in  time  spent  by  male  and 
female  maintenance  personnel       tasks  classified  as  being  "heavy"  or 
dirty.     Also  note  that  work     -formed  by  women  in  the  support  area  is 


445 


rated  by  sut .  v  'sop^  as  beimg  more  difficult  than  that  performed  by  men 
either  in  the  .naiRtsanance  or  support  job  areas. 

Table  e  display  some- of  the  differences  in  the  Katies  performed  by 
men  and  women  in  saipport  jobs.    Women  spend  more  time  flwfntaining  rec- 
ords and  ferns.    Hwer,  they  also  spend  more  time  or^iv-zing,  planning, 
directing,  amd  taoTaHenting— which  are -duties  normally  oerfcrmed  by 
second-term-i^rsEHifr  .    The  difference  repnrted  in  th     itable  are  not 
highly  stab'  -  hp^t^'?  of  the  small  nuraisers  of  cases  invalwed. 

The  datlE?  in  at/le  7  reflect  differences  in  the  dut  its  performed  by 
men  and  wom»:  wortTwg  in  raEnrttenance  job  types.,    ^ir  Ivifjwtnation  should 
be  fairly  stafcle,  srfnce  aiaaroximately  7^  of  the  women  nd  94%  of  the 
men  are  work-ncr-ro  ttr^e  jrabs.    It  appeairs  t*irat  dtffer:!c«s  in  the  uti- 
lization of  nfEH^eHWE  women  '^i  main  tenancy  jobs  are  ver^  irmTl. 

In  on£  suB^rfijjTysis,  wsv  identified  —  feJcs  whTd^.  wese  performed  by 
men  in  the  iarol*?,  out  not  biy  wiomen ,   Bcve»fer,  you  :am   «  f rom  Table  8 
that  not  one  tasks        being  irj-'-^onaed  by  sa  irfe-ny  as  three 

percent  of  tthe  imfu  ^  apK'''«™ately  owaMiaTf  of  ttem  ^iere  being  per- 
formed by  ~hE    tT.m\  rme  pe:- jewt  of  the  BUfen.    Inspectfon  *f  these  tasks 
led  to  the  od>  ""-iSTDr!  ithat  mmn  could        pratow  m  •rform  them,  but 
the  small  iar-  »*  sFrnsoly  faiTea  so  pict  ufs  such  cses^ 

Table  1  one  orf  t««w  major  faaadings  tn  tire  parottre  study.  The 

correlatiwi  0  hm  rpen^  on  h^es  b)  Tteles  wi^h  that  -■■■^  ^females  is  .97 
Tn  the  maTni*fia, ':4  .  j  -Qwes.  =»en tbkt  strik'jwg: is  onrrelation  of 
_S9  between  tm  ^^-TSRrt      ma^  and"t«iaies  perfo^r^ -^g  vartous  tasks. 

appears  tiler   -,<,t  ^e^e  is  -rery  Iftttat  difference    n  the  work  per- 
formed by  mal*'^  dro  femaTes  irr-maints-^nanffi  jobs.    The  relttiionship  be- 
ajfeen  the  wn-sri  ^wrf?rmed  by  mates  ami  fwiles  in  support  jabs  is  con- 
siderably omer^        mentioned  previBms^k.,  this  may  have  teen  due  in 
part  to  unstab^  -<iat£  as  a  function.'ur  i«nple  size. 

Analysis  of^   -tn^uda  Distrtbutions 

We  now  turi,  He  a  second  significant  finding  from  the  probe  study. 
Table  10  refUctS  the  aptitude  requiw«Be»t  levels  for  entry  into  air- 
craft mechanic  spew^^^lties  for  varianK  tttrne  periods.    Note  that  prior  to 
June  1971  and  sins*  'lovember  1975,  a  J«teE3lianica:l  Aptitude  Index  of  50  was 
required  for  enny^    However,  belstteett  tinese  two  periods,  during  which 
most  of  the  incnwfdte'als  in  the  anaaysis  sample  came  into  the  Air  Force, 
applicants  couid^ariify  on  either  tl^he  ^techarrical  or  Electronics  Indexes 
at  the  50th  centnTe  "Hevel. 

Table  11  reftectts  gross  differonceE  in  the  Mechanical  and  Admini- 
strative Aptitu*jlmaexes  for  the  naU  and  female  members  in  the  analysis 
sample.    Notice  iSftt  the  mean  mechartHcar  AI  for  females  was  39.2  which 
is  considerably  the  current  Htttlte jsentile  entry  requirement  level. 

Also,  It  was  apfUHKiniiately  one  and i^tnw-quarters  standard  deviations 
below  the  mean  seme  for  males  in  (Wfrjample. 


f^eseatly  there  are  approxfraai^if  2000  women  working  in  the  Mr- 
craftJfeinfenaras  spaimlly-   This-  sales  possible  one  of  the  most 
definitive  studfes  ewrer  cort#icted  cwiieniales  working  in  a  non-traditional 
area,   ^e  Perswwiel  nteseardi  and  Ck£0*ation  and.  Manpower  Research 
DivisJym  of  KFmi  aw  presently  coKfeting  detsils  for  a  joint  study  of 
femaHtf'  aircraft  mechaniics. 

^ne  joint sttfdy  wfti  involve  ana^sis  of  th^s  complete  input  female 
sampfe  to  (feterwine  techniical  school  success,  attrition,  retraining  out 
of  tte  specialty,  and  anther  factors  resting  to  ^sidwlization  of  the 
groMp.    Far  those  stfm  woHring  in  th^pecialty,  we  will  evaluate  their 
uti iTizzat^on  patAems  *t  the  task  Iteve.,  survey  tteir  aittttudes,  evaluate 
theirr  fm  Qjuiidnce  T^ls,  and  iidentifjr  those  who  are-raovfid  from  main- 
tenance to  supBBWt  J06s  lo  determnnK  why.    We  will  analyzE  task  reqotre- 
mente  *!wr  starei^tti,  stantlna,  ani-p^jdHomtor  skilJfc  mni  determine  how 
such  taste ^re  qaerffjnnerf  by  meabersiff  both  axes.    Me  hope  to  adminis- 
ter e«perT«5nilal  ests  of  mechanical  aptitude  and  vialidate  them  agarrnst 
current  and  fu^.ui«  performance  information.    We  wffrT  analyze  promotion 
test  scores  atii  c^p-'re  men  and  women  supervisors.    Finally,  all  cases 
will  be  foSloifed  throufheut  their  car«rs  to  dete«!ww>e  career  patterns 
and  atti tuae  cnhanBS>'- 

Our  gaa^Js  -o  have  sufff'iciesit  imfc/nnation  tcraairess  present  and 
furure  mtLiimMMit  questfons  regarding  fi^tele  utiltisrion  in  the  Air 
Force. 


The  distributions  of  scores  for  the  two  subsamples  in  Table  12  are 
even  more  striking.    Approximately  57%  of  the  females  in  the  probe  study 
scored  below  the  50th  centile  and  therefore  could  not  presently  qualify 
for  entry  into  the  Aircraft  Maintenance  career  field.    Yet,  results  from 
the  probe  study  indicate  that  75%  of  these  women  were  working  in  mechan- 
ical job  types  and  were  performing  essentially  the  same  tasks  as  males. 
Furthermore,  all  of  the  women  in  the  probe  study  had  successfully  com- 
pleted technical  training  courses  and  had  been  working  in  the  Aircraft 
Maintenance  specialty  for  a  number  of  months.    However,  this  is  a 
residual ized  group.    We  don't  know  how  many  male  or  female  cohorts 
failed  to  graduate  from  school,  and  we  don't  know  how  many  of  them  were 
retrained  out  of  the  specialty.    Also,  the  present  study  is  solely 
concerned  with  the  time  being  spent  on  particular  tasks  by  males  and 
females.    It  does  not  address  questions  of  the  speed  or  quality  of 
performance.    All  of  these  issues  will  be  treated  in  the  follow-on 
study.    There  is  a  growing  body  of  evidence  that  mechanical  aptitude 
tests  which  have  historically  been  shown  to  be  highly  predictive  of 
success  for  males  may  not  be  appropriate  for  females. 

Tabile  13  lists  some  of  the  tests  typically  included  in  differential 
aptitude  batteries  such  as  the  AQE  and  ASVAB.    Automotive  Information 
and  Shop  Information  are  primarily  measures  of  mechanical  experience. 
However,  for  the  male  population,  it  turns  out  that  experience  measures 
are  good  indicators  of  interest  level  and  of  ability  to  do  well  in  sub- 
sequent mechanical  training.    Because  of  cultural  differences,  the  same 
is  not  necessarily  true  for  females.    The  Personnel  Research  Division  of 
AFHRL  is  giving  high  priority  to  work  on  new  mechanical  aptitude  measures 
which  are  appropriate  for  women. 

CONCLUSIONS  AND  DISCUSSION 

1.  In  traditionally  all-male  specialties  recently  opened  to  females, 
only  a  limited  number  of  females  can  be  expected.    Caution  should  be 
exercised  by  those  who  may  be  extracting  data  on  females  from  survey 
studies  and  accepting  self-reported  sex  identification  as  being  accurate. 

2.  Care  should  be  taken  to  control  for  differences  in  lengths  of 
service  when  comparing  men  and  women  in  non-traditional  specialties. 

3.  There  is  very  little  apparent  difference  in  work  performed  by  males 
and  females  in  maintenance  jobs  within  the  431X1  Aircraft  Maintenance 
Specialty. 

4.  Females  appear  to  migrate  from  maintenance  to  support  jobs  more 
rapidly  and  in  higher  proportion  than  males. 

5.  Mechanical  aptitude  tests  highly  predictive  of  success  for  males 
may  not  be  appropriate  for  females  in  non-traditional  specialties. 

6.  Additional  research  will  investigate  questions  arising  from  the 
probe  analyses. 
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PERCENT  MALES  m  IMIS  IN 


SELECTION  OF  SAMPLE  F0«  UTILIZATION  OF  WOMEN  S^JDY 
AIRCRAFT  MECIWA^iNICS  431X1  C,  E,  &  F 


S-43  MONTHS" 

mmS  SURVEY  SAMPLE 

VIALES 

FEMALES 

N 

1959 

206 

~'hiV. ')  M 

25.8 

22.9 

S.  D. 

10.1 

7.3 

8-43  MONTHS  ^«*ALYSIS  SAMPLE 

MALES 

FEMALES 

N 

1015 

202 

TAFMS  M 

22.9 

22.9 

S.  D. 

7.2 

7.3 

4sr. 
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Table  2; 


DISTRIBUTION  OF  MALES  ANS  FEMALES  IN  SUPPORT  AND  IMAINTENANCE  JOBS 


SUPPORT 

MAINTENANCE 

JOBS 

JOBS 

Tech  Orders 

Crew  Chiefs' 

Training 

Inspection 

Documentation 

Special  Maintenance 

Safety 

Jo!)  Control 

Deficiency  Analysis 

Bencti  Stock 

CONTAINS:  CONTAINS: 

M  j  5.6kfoiales  of  males  I  x^=89.35,  (lf= 

(ZWofferaleS'  ?3.8?l  of  females)  p< 

'5W  of  all  males  and  lAUl  ail  females  work  in  "Crew  Chief"  jobs 


PERCENT  MALES  AND  FEMALES  IN  "SUPPORT" 
JOB  TYPES  BY  TAFMS 


TAFMS 

%  OF 

MALES 

/o  Ur 

FEMALES 

0-12 

3.74 

8.70 

13-24 

3.69 

1/.92 

25-43 

8.65 

43.84 

TOTAL 

5.62 

26.24 

Table  4: 


JOB  ATTITUDES  BY  SEX  AND  JOB-TYPE  CLASS 
431X1  UTILIZATION  OF  WOMEN  STUDY  * 


SUPPORT 

MAINTENANCE 

IVltAN 

MEAN 

MEAN 

AHITUDE  ** 

MALE 

FEMALE 

MALE 

FEMALE 

Job  Interest 

4.53 

4.36 

4.64 

4.72 

Utilization  of  Talent 

3.18 

2.75 

3.21 

3.00 

Utilization  of  Training 

2.70 

2.52 

3.53 

3.54 

Reenlistment  Intent 

1.98 

2.27 

2.30 

2.36 

*  S.  D.  About  1. 3  - 1. 6  for  Attitude  Variables;  About  1. 00  for 
Reenlistment  Intent,   Which  is  Defined  As  Follows: 
1  =  No;  2  =  Uncertain,  Probably  No;  3 '  Uncertain,  Probe 
Yes;  4  =  Yes 


**No  significant  differences  in^sts  between  any  male/female  pairs. 


Table  5: 

PERCENT  TIME  SPENT  ON  VARIOUS  CUSSES  OF  TASKS  BY 
MEN  AND  WOMEN  IN  "SUPPORT"  vs  "MAINTENANCE"  JOB  TYPES 


CLASS  OF  SUPPORT  MAINTENANCE 


TASKS 

MALE 

FEMALE 

MALE 

FEMALE 

Clerical 

26.4 

46.7 

5.6 

6.8 

Heavy  Maintenance 

2.5 

.3 

8.6 

8.3 

Light  Alaintenance 

5.6 

1.2 

14.6 

13.9 

"Dirty"  Maintenance  Tasks 

.8 

.6 

8.6 

8.3 

Inspect,  Check,  Troubleshoot 

8.6 

3.2 

37.2 

37.2 

Cfivf  ISijpport,  Non-Clericall 

56.0 

48.1 

25.3 

25.4 

^.V;:;  i  f^SKS  PERFORMED 

28.1 

18.8 

157.9 

141. 1  ** 

AVGTASKDIFF.  PER  UNIT  TIME 

4.4 

5.0 

4.4 

4.3 

••p<.OI  t-tests  were  not  computed  for  class  of  task  categories. 
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:  Table  6: 

y 

PERCENT  TIME  ON  VARIOUS  DUTIES  FOR  MALE  AND 
FEMALE  PERSONNEL  WORKING  IN  SUPPORT  JOB  TYPES 


%TIMF 

A>  1  iiyir 

/o  1  ilYIc 

DUTY 

MAIF 

IVInLL 

niFFFPFMPF 

"  Pprfnrdfiinn  ^iinnlv  Fiinrtinnc 

dJ,U 

U.U; 

"  Maintaininn  Fnrin^  and  Rprnrrl^ 

-in  fii 

"lU,  ol 

P  Maintaining  780  Equipment 

15.29 

2.70 

12.59 

}  Maintaining  Non-Powered  AGE  Equipment 

10.08 

0.40 

9.68 

\- Organizing  and  Planning 

3  Performing  General  Aircraft  Maintenance 

9.24 

19.78 

-10.54  - 

UO 

3.30 

3.40 

}  Directing  and  Implementing 

6.42 

12.65 

-6.26 

i  Training 

4.94 

4.60 

0.34 

I  Inspecting  and  Evaluating 

4.74 

4.75 

-0.01 

1  Ground  Handling  of  Aircraft 

3.67 

1.55 

2.12 

SUBTOTAL 

98.67 

98.10 
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Table?: 

PERCENT  TIME  ON  VARIOUS  DUTIES  FOR  MALE  AND  FEMALE 
PERSONNEL  WORKING  IN  MAINTENANCE  JOB  TYPES 


DUTY 

♦ 

%TIME 

%TliyiE 

/V  1  llilL 

%TllVIF 

MALE 

FEMALE 

G  Performing  General  Aircraft  Maintenance 

22  61 

(1  (IS 

H  Performing  Ground  Handlinq  of  Aircraft 

20  03 

21 63 

-1  60 

1  Maintaining  Landing  Gear  Systems  • 

11.24 

9  88 

1  36 

M  Maintaining  Electrical  Systems 

6  32 

7  27 

K  Maintaining  Fligtit  Control  Systems 

5 19 

t)  Performing  General  Engine  Maintenance 

5.05 

4  43 

n  6? 

L  Maintaining  Pneudraulic  Systems 

5.00 

5.37 

■0.37 

E  Maintaining  Forms  and  Records 

4.79 

5.55 

-0.76 

0  Maintaining  Non-Powered  AGE  Equipment 

4.45 

3.80 

0.65 

N  Maintaining  Fuel  Systems 

4.17 

4.04 

0.13 

J  Maintaining  Utility  Systems 

3.23 

3.14 

0.09 

F  Performing  Supply  Functions 

2.46 

2.51 

-0.05 

G  Inspecting  and  Evaluating 

1.54 

1.41 

0.13 

r 1 inTAT* 1 r 

A/  An 

A/     1 A 

VO. 

Table  3: 


PERCENT  OF  MALES  PERFORMING  lASKS  NOT  BEING  PERFORMED  BY  FEMALES 


ERIC 


PERCENT  OF  NUMBER  OF  TASKS 
MALES  NOT 


BY  FEMALES 


311  or  more  o 

2.0-2.9  6 

I 

15-1.9  9 

1.0 -U   .  /II 

less  llian  \$  55 


TOTAL  111 


I'^^lt  ♦of!)77tasksinttieinventoiy 


Table  9: 


CORRELATIONS  BETWiEN  MALE  AND  FEMALE  WORK  IN  431X1  C,  E,  &  F 


VARIABLES 


%  PERFORMING  TASKS  IN 
-MAINTENANCE  JOB  TYPES 


PERFORMING  TASKS  IN 


SUPPORT  JOB  TYPES 


CORRELATION 
MALE  VS  FEMALE 


%  TIME  SPENT  ON  TASKS  IN  97 
MAINTENANCE  JOB  TYPES 


.99 


%  TIME  SPENT  ON  TASKS  IN  59 
SUPPORT  JOB  TYPES 


.58 


All  correHions  greater  than  0  at  .01  level. 


APTITUDE  REQUIREMENTS 

T'IME  PERIOD 

Prior  to  June  1971 
June  1971  -  November  1975 
Since  November  1975 


FOR  ENTRY  INTO  431X1 

REQUIREMENT  LEVELS 

Mechanical  50 
Mechanical  or  Electronic  50 
Mechanical  50 


APTITUDE  INFORMATION  FOR  ANALYSIS  SAMPLE 
431X1  UTILIZATION  OF  WOMEN  STUDY 


APTITUDE 
COMPOSITES 

MEAN 
MALE 

MEAN 
FEMALE 

t 

Meehanical 

68.9 

39.2 

2L55" 

Electronic 

64.5 

60.7 

3.90'* 

General 

60.7 

69.9 

7.46^ 

Administrative 

47.6 

63.4 

10.79- 

p<.OI 

493 
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DISTRIBUTION  OF  APTITUDE  SCORES  FOR  ANALYSIS  SAMPLE  BY  SEX 


MECHANICAL  A I  ELECTRONIC  A I 


SCORE 

MALE 

FEMALE 

MALE 

FEMALE 

U 

ft 

u 

A 

0 

1 

0 

c 

n 
U 

It 

4 

A 

0 

in 
ill 

U 

9 

1 

u 

0 

i: 

16 

A 

0 

1 

7 

A 

0 

O 

7 

33 

6 

0 

7 

/ 

1>l 

14 

7 

A 

0 

JJ 

n 
V 

12 

11 

3 

/in 

11 

10 

12 

57 

3 

in 
ID 

r 

5 

4  A 

42 

1 

OA 

00 

23 

AA 

82 

36 

l;o 

07 

27 

1 1  r 

115 

i  1 

41 

l£7 

22 

131 

42 

07 

07 
0/ 

7 

7 

1 1  J 

114 

A  / 

26 

7n 
/u 

Ik 

.2 

TAP 

105 

14 

/O 

3 

00 

83 

1  it 

14 

80 

65 

1 

74 

A 
0 

85 

85 

0 

67 

3 

90 

114 

0 

47 

5 

IJ 

AO 

A 

0 

AA 

33 

A 

2 

MEAN 

68.93 

39.16 

64.45 

60.66 

SD 

16.29 

17.96 

15.86 

11.66 

ERIC 
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MECHANICAL  SUBTESTS  IN  AQE/ASVAB 


A/iechanicai  Principies 
Automotive  Information 
Shop  Information 


SECTION  4 
GENERAL 


strain  bv  prolonged  duty  hours  and  problems  as  to  mobility 

of  soldiers  -  as  seen  by  the  Federal  Armed  Forces  Association, 


By  Colonel  H."^.  Seuberlich. 


I.  Introduction, 

The  German  Federal  Armed  Forces  Association,   founded  in 
1956  by  55  soldiers,  today,   as  a  top  organisation  with 
more  than  230,000  members  of  all  status  groups  and  ranks, 
represents  the  professional  and  social  interests  of 
servicemen.   Its  highest  authority  is  the  General  Meeting, 
From  it  the  Federal  Board  receives  its  commission.  The  lOth 
General  Meeting  passed  the  programme  for  the  next  four 
years  with  300  resolutions. 

The  individual  resolutions  reflect  the  manifold  problems  of  the 
servicemen  of  the  Federal  Armed  Forces  in  the  late  70s, 
With  their  Comprehensive  fundemental  programme,   they  will 
be  taking  effect  far  into  the  next  decade.  Thus  they  are 
now  already  forming  the  image  of  the  serviceman  of  the  80s, 

Allow  me  to  cite  two  crucial  resolutions.  One  demands  a 

stock-taking  of  the  "personal    and  social  situation  in 

the  Federal  Armed  Forces,"  Defence  Minister  Apel  has 

already  introduced  this  in  the  meantime.  The  second  resolution 

which  I  want  to  mention  outlines  a  general  defence  concept 

for  the  Federal  Republic  of  Germany,   it  includes,  inter 

alia,   the  demand  for  compulsary  service  in  which  women 

should  also  be  included,. 

In  this  consideration  the  goals  are: 

-  the  improvement  of  conscriptional  equality 

-  securing  the  necessary  personel  in  the  second  half  of  the  80s 
if  the  rise  in  the  number  of  those  liable  to  military  service 
begins  to  sink  as  a  result  of  the  structure  of  age  groups  in 
the  Federal  Republic  of  Germany. 

-  increased  motivation  for  the  military  service. 

These  are  indications  of  the  future.  Both  should  contribute 
to  the  ■  abolition  of  the  disturbance  of  the  equilibrium 
which  has  arisen  in  recent  years,   and  which  have  formed  a 
focal  point  for  measures  and  budgetary  means  with  the 
change  in  armament  and  modernisation  of  the  German  Federal 
Armed  Forces  with  arms  systems  of  the  future.  The  Federal 
Armed  Forces  Association  emphasizes  that  despite  technical 
perfection/   the  human  factor  should  not  be  rejected,  because 
even  the  technically  most  advanced  arms  system  in  effect 
is  only  as  effective  as  the  serviceman  who  operates  it,  and 
as  his  sincere  willingness  to  do. his  best.  This  willingness 
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will  be  encouraged  or  impaired  by: 

-  his  social  situation 

-  the  strain  imposed  by  his  daily  service 

-  his  jqb  satisfaction 

-  many  influences  from  the  outside  world,   especially  from 
his  own  family. 

His  committment  in  an  emergency  will  be  largely  conditioned 
by  the  knowledge  of  how  his  family  and  the  civil  population 
will  be  protected  from  attack. 


II.  Job  evaluation. 

1.  The  factual  situation  of  1ob  evaluation  in  the  public 
service  of  the  Federal  Republic  of  Germany. 

Since  1975  the  basic  principle  for  the  function  orientated  pay 
has  been  contained  in  the  Federal  Law  of  Payment.  According  to 
this  the  level  of  payment  of  civil  servants,   judges,  and 
soldiers  should  be  determined  according  to  the  importance  of 
their  fufilled  functions.  Therefore  their  functions  should  be 
properly  assessed  according  to  their  requirements  and  the 
appropriate  ranks  should  be  assigned.  The  aim  in  this  respect 
is  just  payment  in  the  civil  service.  That  requires  a 
standardized  scale  of  assessment  for  all  departments  of  the 
public  service . 

The  Federal  Ministry  of  the  Interior  is  responsible  for  this. 
It  is  working  on  the  development  of  a  relevant  REFA  system.  A 
first  file  of  characteristics,   for  the  civil  service  only, 
has  already  been  tested  in  281  jobs.  A  second  edition  is 
being  tested  now  until  next  year  on  a  larger  scale. 

2.  Conception  and  cooperation  of  the  German  Federal  Armed 
Forces  Association^ 

The  hitherto  existing  files  of  characteristics  do  not  yet  record 
the  service  characteristics  of  servicemen  with  their  manifold 
special  demands  and  stresses,  or  do  so  insufficiently.  Therefore 
it  is  not  clear  either  how  they  should  be  evaluated.  At  its  10th 
General  Meeting  the  German  Federal  Armed  Forces  Association' dealt 
with  these  problems  in  detail  and  par^sed  a  draft  conception.  From 
this  a  few  basic  principles  : 

-  The  special  requirements  and  3tr esses  on  servicemen  as  opposed 
to  other  sections  of  the  public  service,  are  to  be  evaluated  • 
according  to  standardized  scales. 

-  the  claim  on  servicemen's  time,   the  frequent  separation 
from  their  ramilies  and  .the  frequent  change  of  duty  must  also 
be  considered.  ^T-J 
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-  It  must  be  possible  to  consider  fairly  any  probationary 
periods  or  experience  necessary 

-  A  comprehensive  analysis  of  the  requirements  of  all  posts 
in  the  Armed  Forces  will  be  necessary  for  this  evaluation. 

-  This  requires  a  job  specification  which  makes  the 
determination  of  a  verifiable  job  evaluation  possible. 
This  will  be  served  only  by  a  file  of  characteristics  with 
the  typical  characteristics  of  the  activities  in  the  armed 
services  for  a  standardized  listing.  This  file  is  to  be 
coordinated  with  the  files  of  the  other  sections  of  the 
public  service. 

The  army  chairman,   Colonel  Seuberlich,   has  been  commissioned 
with  the  representation  of  these  principles  within  the 
scope  of  the  present  activities  of  the  Ministry  of  the 
Interior. 


3.  The  activities  of  the  Federal  Ministry  of  Defence-r  ' 

The  comprehensive    analyses  of  the  requirements  for  all  posts 
in  the  armed  forces  commissioned  by  the  Federal  Armed 
Forces  Association  serve  the  projects  "function  analyses 
of  the  personel  structure"  undertaken  by  the  Federal  Armed 
Forces  Association  already  known  to  you.  They  are  based  on 
the  suggestions    made  by  the  Commission  on  Personel  Structure 
which  were  also  lectured  on  several  times  before  the  MTA. 
The  Federal  Ministry  of  the  Interior  is  aiming  for  an 
agreement  on  the  critical  points  with  the  Federal  Ministry 
of  Defence  by  January  1979,   to  the  effect  that  also  the 
servicemen  in  the  next  trial  period  can  be  included. 

111.  Strain  by  prolonged  duty  hours. 

1.  The  evolution  of  working  time  regulations. 

When  the  Federal  Armed  Forces  were  established  in  1956  the 
official  working  hours  were  48  hours  per  week.  Within  15 
years  this  was  reduced  to  a  42-hour  week.  In  1974  the  40  hour 
week  was  even  introduced  into  the  public  service.  The 
Chancellor  of  the  Federal  Republic  recognissed  this  reduction 
with  a  S%  pay  rise  .Civil  servants'  overtime  over  and  above 
this  which  can  be  measured  and  cannot  be  compensated  for 
by  free  time,  is  reimbursed. 

The  reimbursement, which  has  been  raised  several  times  in 
recent  years,   at  the  moment  amounts  to  between  9.50  DM 
•and  18.50  DM  per  hour  for  civil  servants  of  the  various 
pay  categories.  ^ 

5  0^1 
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2.  The  sit-nation  of  the  serviceman > 


For  servicemen,  however,  neither  the  regulation  of  working 
hours  nor  the  payment  of  overtime  has  up  till  now  been 
planned.  As  they  receive  no  other  compensation,  their 
social  equilibrium  is  considerably  disturbed  as  far  as 
wages  are  concerned,  as  :  according  to  statistics  submitted 
by  the  German  Federal  Armed  Forces  Association  to  the 
Lower  House  of  the  Federal  Government  in  1978  soldiers  work 

-  60%  regularly  up  to  50  hours  a  week 

-  12%  between  51  and  60  hours  a  week 

-  5%  over  60  hours  a  week 

Manoeuvres  and  military  excercises  of  on  average  40  days 
per  year  are  not  included  in  these  figures,  although 
that  would  in  most  cases  bring  the  hours  up  to  about  80 
per  week. 

A  general  regulation  of  working  hours  for  servicemen  cannot 
be  coordinated  with  the  necessary  readiness  for  action  of  the 
armed  forces  with  the  present  number  of  personel .  Nevertheless 
this  special  stress  must  be  entered  in  the  file  of 
characteristics  for  servicemen,   in  order  to  unequivocally 
record  the  disturbed  social  equilibrium  ,  and  to  further 
the  search  for  a  possible  solution.  That  such  a  solution 
exists  is  shown  by  the  example  of  policemen  ,  who  are 
comparable  to  servicemen,  whether  it  be  the  Federal  Border 
Police  or  the  police  forces  of  the  states  o£  the  Federal 
Republic  of  Germany,  with  a  40  hour  week  and  overtime  pay. 


III.  The  ascertainment  of  normal  duty  hours  for  different 
groups . 

« 

Here  it  is  neither  a  matter  of  the  introduction  of  a  40  hour 
week  for  servicemen,   nor  of  the  creation  of  the  basic 
requirements  necessary  so  that  servicemen  could  receive 
overtime  pay.  Both  would  be  a  gross  misunderstanding.  It 
would  lead  to  a  bureaucratized  army  and  to  the  time  clock 
serviceman,   which  the  Federal  Armed  Forces  Association 
decidedly  rejects. 

The  Association  acts  rather  from  the  assumption  that  in  the 
modern  armed  forces  the  demands  of  time  on  the  serviceman 
arising  as  a  result  of  military  excercises,  manoeuvres  and 
duties  of  various  kinds  vary  greatly  in  the  course  of  a  year 
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They  are,   moreover,   dependant  on  the  different  demands  and 
situations  in  the  individual  branches  of  the  armed  forces, 
distributed  among  the  SOLL  personel .   It  is  therefore  the 
opinion  of  the  iT^sderal  Armed  Forces  Association  that  these 
connections  should  be  investigated  in  detail,    and  that 
subsequently  the  relevant  conclusions  should  be  drawn. 
These  could  be  concerned  with  organisation,    finances  and 
personel.   They  should,    though,    be  concerned  with  the 
''normal  working  hours   "  of  whole  units,    and  not  those  of 
the  individual  serviceman.   Thus  6  to  lO  large  groups  will 
be  formed  of  servicemen  exposed  to  similar  or  comparable 
stress,    and  which  would  be  under  consideration  to  find  feasable 
and  socially  balanced  justifiable  solutions. 


IV.  Mobility 

Mobility  is  one  of  the  characteristics  of  service  in  the 
armed  forces.   For  the  serviceman  it  therefore  entails  the 
obligation  to  allow  himself  to  be  transferred  at  any  time, 
to  take  part  in  training  courses  or  to  take  on  new  duties . 
This  characteristic  of  service  in  the  armed  forces   ,  often 
linked  with  a  change  of  base,    affects  about  a  quarter  to 
a  third  of  all  professional  servicemen  and  ''Zeitsoldaten'' 
(volunteers  who  sign  up  for  a  certain  number  of  years)  every 
year.   Unmarried  people  take  this  more  or  less  in  their  stride, 
but  married  servicemen  are  as  a  rule  confronted  with  many 
problems.   The  Federal  Airmed  Forces  Association^    inter  alia, 
has  investigatied  them  with  their  wives  in  two  symposia, 
and  the  results  can  thus  be  summerized  : 

-  children  are  the  worst  affected;   they  are  uprooted  from 
familiar  surroundings,   have  to  change  schools,    lose  friends 
and  other  social  ties.   The  slogan  "if  the  father  is  transfered 
the  child  has  to  repeat  a  year  at  school''  characterizes  however 
only  one  aspect  of  the  problem,   when  the  child  is  not  able 

to  continue  his  education  at  a  new  school  without  any  breaks 
or  inconsistencies,,   as  a  result  of  the  confusion  created  by 
the  particularistic  educat  ional  policy  of  Federal  Germany. 

Inner  conflicts,   possibly  involving  psychological  damage, « 
whether  it  is  the  children  or  the  wives  who  discover  only 
after  a  matter  of  years  that  they  are  unable  to  cope  with 
a  constant  change  of  address. 

-  For  children  the  frequent  change  of  address  entails  a  lasting 
negative  influence  on  their  future  careers:   not  counting  the 
inferior  educational  opportunities  for  school-leavers  in 
economically  weak  areas  in  which  military  bases  are  often 
situated. 


-  the  wives  are  also  forced  to  make  sacrifices  which 
are  not  asked  of  other  women  of  our  society.  That 
opportunity  of  self-realization  through  a  career  is 
denied  them  which  our  women  strive  towards  with  increasing 
zeal  as  a  result  of  their  new  self-confidence. 

-  Both  the  woman's  share  in  the  family's  earning  power 
and  her  own  occupational  and  social  security  are  reduced 
to  a  minimum. 

-  Financial  sacrifices,-  not  counting  those  incurred  by 
moving  house,-  are  often  caused  by  high  rents,  which 
even  pay  rises  through  promotion  do  not  cover;  and  a 
transfer  is  not  always  coupled  with  promotion  any^  y. 

Nevertheless,   frequent  transfers  do  not  affect  all  servicemen 
equally.  They  vary  according  to  the  different  ranks  and 
duties . 


V.  Possibilities  and  limits  on  the  conclusions  for  the 
consequences  of  strain  by  prolonged  duty  hours  and 
mobility. 

Both  strain  by  prolonged  duty  hours  and  mobility  contain 
characteristics  of  service  in  the  armed  forces.  Both  have 
one  thing  in  common  -  they  are  not  directly  connected 
with  a  definite  post  and  its  demands.  Therefore  both 
exceed  the  systematics  of  an  analysis  of  the  requirements 
concerned  with  a  post  in  the  services  and  its  proper 
assessment. 


The  situation  is  further  complicated  by  the  fact  that  both  components 
have  a  certain,   at  times  close  correlation  to  one  anther.  For 
every  new  duty  demands  a  period  of  vocational  adjustment 
which  usually  also  entails  additional  working  hours.  That 
means  that  the  frequency  of  the  change  of  post  increases 
the  strain  by  prolonged  duty  hours. 

Vfliile  "normal  working  hours"  can  be  established  according 
to  organisational  fields,   strain  as  a  result  of  mobility 
has  to  be  established  according  to  service  ranks  and 
duty  or  training  groups. 

General  possibilities  for  the  alleviation  of  the  negative 
consequences  of  mobility  could  be  found  in  : 

-  more  generous  compensation  for  expenses  entailed  by 
moving  house. 
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-  measures  for  the  standardization  of  rents  in  the 
different  bases. 

-  Assistance  in  the  integration  of  families  in  new 
bases  in  all  the  different  areas  of  life. 


VI.  Resxime  and  outlook 

The  phenomena  of  strain. by  prolonged  duty  hours  and 
mobility  are  becoming  noticably  a  problem  for  the 
armed  forces,,  because  they  are  no  longer  accepted  by 
servicemen  and  their  families  as  inevitable,  but  are 
compared  with  the  working  conditions  of  tohers. 

Organisors,  personel  planners,  those  who  draw  up  plans  for 
trainig  schemes  and  work  timetables  are  therefore 
cooperating  more  and  more  closely  with  one  another.  With 
regard  to  general  development  in  the  working  society, 
it  is  impossible  in  the  long  term  to  try  to  explain 
away  social  inequalities  simply  as  the  characteristics 
of  work  in  the  armed  forces,  without  seriously  endangering 
the  necessary  motivation. 

In  a  file  of  characteristics  for  servicemen,  therefore, 
strain  by  prolonged  duty  hours,  and  mobility  must  also 
be  considered  as  factors  of  work  science. 

In  its  role  as  social  early  warning  system  the  German 
Federal  Armed  ''^orces  Association  has  for  years  been  drawing 
attention  to  the  complexity  of  these  problems  and  in  the 
future  will  also  point  out  concrete  possibilities  for  a 
solution. 


469 


Computer  Assisted  Reference  Locator  (CARL)  System: 
An  Overview^ 


by 

William  A,  Sands 


Acquisition  and  Initial  Service  Program 
Navy  Personnel  Research  and  Development  Center 
San  Diego,  California  92152 


The  20th  Annual  Militazy  Testing  Association  Conference 
Oklahoma  City,  Oklahoma 
30  October  -  3  November  1978 


The  opinions  or  assertions  contained  herein  are  those  of  the  writer 
e  not  to  he  construod  as  official  or  reflecting  the  views  of  the  Nav> 
pa rtment . 


'470 


INTRODUCTIOiV 


Background 

The  problem  which  originally  prompted  the  present  author's  interest 
in  the  field  of  information  retrieval  is  humorously  related  by  Redican 
(1973)  in  an  article  entitled  "Reprints:    File  Before  They  Defile  You," 
published  in  the  American  Psychologist.    He  states  that: 

Several  centuries  ago,  a  philosopher,  whose  name  now  escapes 
me,  is  said  to  have  come  to  an  untimely  end  when  his  over- 
loaded bookcase  toppled  over  and  buried  him.    Although  the 
condition  of  most  psychologist's  bookshelves  is  undoubtedly 
not  quite  so  lethal,  many  are  burdened  by  sizable  piles  of 
reprints,  manuscripts,  journals,  and  books  that  await 
filing  (July  1973,  p.  625). 

The  creation,  utilization,  and  maintenance  of  a  reference  retrieval 
system  was  identified  as  a  professional  problem  for  the  field  of  psychol- 
ogy twenty-five  years  ago  by  Daniel  and  Louttit  (1953).    The  magnitude  of 
the  problem  can  be  appreciated  by  the  recognition  that  the  growth  rate  of 
scientific  publications  is  exponential,  with  the  number  of  references 
doubling  every  13-15  years  (Price,  1961,  1963).    This  rate  of  information 
production  can  be  overrvhelming  to  the  individual  researcher  who  attempts 
to  keep  up  with  the  literature  in  a  particular  field-    The  initial  attempt 
at  imposing  order  on  a  growing  personal  reference  library  often  involves 
filing  documents  alphabetically  by  the  last  name  of  the  senior  author. 
Inevitably  there  arises  a  situation  where  the  researcher  knows  that  the 
library  contains  a  reference  on  a  particular  topic  but  cannot  recall  the 
author's  name.    Location  of  the  reference  requires  a  sequential  search  of 
the  alphabetical  file  of  source  documents.    With  a  small  reference 
library,  this  sequential  search  process  is  inconvenient-    As  time  passes 
and  the  size  of  the  reference  library  grows,  the  sequential  search 
strategy  becomes  increasingly  burdensome-    Therefore,  some  alternative 
information  retrieval  method  is  sought - 

Information  Retrieval  Systems 

Lancaster  (1968),  adopting  a  broad  perspective,  maintains  that 
information  retrieval  encompasses  all  the  activities  from  the  initial 
acquisition  and  indexing  of  source  documents  to  the  search,  retrieval, 
and  delivery  of  the  results  of  a  query  to  the  user-    He  further  points 
out  that  an  information  retrieval  system  does  not  change  the  knowledge 
of  the  user  on  any  StSject-    Rather,  the  system  simply  informs  the  user 
of  the  presence  or  absence  of  source  documents  on  a  particular  topic  and 
the  location  of  all  pertinent  documents - 

A  wide  variety  of  information  retrieval  systems  is  available 
(Bourne,  1963;  Lancaster,  1968)  including  a  manual  system  using  ordinary 
index  cards,  an  edge-punch  card  system  employing  thin  rods  to  sort  and 
retrieve  information,  and  various  computer-b.iscd  systonis.     One  manual 


approach  which  has  considerable  merit  is  the  ''accession  number  coordi- 
nated system"  described  by  de  Alarcon  (1969),  and,  specifically,  a 
variation  knovm  as  "Uniterm"  introduced  by  Taube  (1953)  and  discussed  in 
the  American  Psychologist  by  Broadhurst  (1962).    The  coordinate  indexing 
method  advocated  by  Broadhurst  is  described  as  follows: 

The  procedure  involves  setting  up  a  separate  card  file  to  index 
the  collection  of  references,  these  being  filed  irrespective  of 
author  or  content  but  merely  according  to  a  serial  number 
assigned  to  each  reference  as  it  is  added — technically  the 
accession  number.    The  classification  of  the  reference  is  then 
done  by  selecting  certain  key  words  and  underlining  them 
('tracing').    This  is  most  conveniently  done  as  the  reference 
is  read,  and  can  be  done  on  the  card  or  the  associated  reprint 
if  you  have  one,  or  in  your  copy  of  the  journal,  so  long  as  the 
serial  number  given  to  either  of  the  latter  is  the  same  as  that 
shown  on  the  reference  card.    These  key  words  (or  'Uni terms') 
which  have  been  created  by  the  tracing  procedure  can  be  as  many 
or  as  few  as  you  like,  depending  on  your  interests  and  the 
relevance  of  the  material  to  them.    If  an  important  classifica- 
tion does  not  occur  in  the  text  or  reference,  it  can  be  added. 
The  serial  number  of  the  reference  is  then  transferred  ('posted') 
to  cards  which  merely  bear  the  appropriate  Uniterms  as  headings. 
Proceeding  in  this  way  generates  a  personal  psychological  vocab- 
ulary of  Uniterms  represented  by  cards  each  having  the  serial 
numbers  of  references  on  them.    Retrieval  of  information  then 
becomes  simple.  "  Consideration  of  any  one  Uniterm  card  will  give 
the  serial  numbers  of  all  the  references  which  deal  with  the 
subject  in  question.    For  two  subjects,  take  the  two  Uniterm 
cards,  and  compare  them  for  coincidences  of  number.    Such  coin- 
cidences of  number  indicate  references  which  deal  with  both 
subjects.    Making  a  series  of  such  cross  matches  of  Uniterm 
cards  will  yield  information  about  the  collection  of  references 
in  as  broad  or  as  fine  a  detail  as  is  required.     (1962,  p,  137). 

Thus  the  term  coordinate  indexing  is  quite  descriptive.    Each  Uniterm  can 
be  considered  as  one  of  a  set  of  classification  coordinates.     For  any 
pair  of  Uniterms,  the  accession  number (s)  at  which  the  pair  intersects 
represents  a  reference(s)  that  has  been  classified  as  belonging  under  both 
Uniterms . 


THE  CARL  SYSTEM 

Design  Objectives 

The  Computer  Assisted  Reference  Locator  (CARL)  System  is  a  computer- 
based  information  retrieval  system  which  generally  follows  the  coordinate 
indexing  approach  described  above.    Some  of  the  objectives  considered  in 
designing  the  system  were:    (1)  simplicity  of  reference  query  and 
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retrieval;  (2)  ease  of  system  maintenance  (additions,  deletions,  cor- 
rections, etc.);  and  (3)  adaptability  for  alternative  computer  systems. 

The  first  objective,  query- retrieval  simplicity,  is  primary.    If  a 
system  is  too  complicated,  troublesome  or  time-consuming,  a  researcher 
probably  will  avoid  using  it.    From  a  practical  standpoint,  any  infor- 
mation retrieval  system  which  is  not  used  is  worthless,  regardless  of 
the  creativity  displrxyed  in  the  system  design  or  programming. 

The  second  objective,  maintenance  simplicity,  is  also  important. 
If  the  system  is  so  complex  that  changes  to  the  existing  data  files  or 
the  addition  of  new  references  to  the  system  is  a  monumental  task,  the 
system  probably  would  be  expensive  in  terms  of  time,  costs,  or  both. 
Obviously,  this  expense  could  constrain  the  utility  of  a  reference 
retrieval  system. 

Tlie  third  objective,  adaptability,  is  a  desirable  characteristic  as 
it  will  facilitate  the  adoption  and  use  of  the  CARL  System  by  researchers 
having  access  to  a  wide  variety  of  computer  systems  with  different 
operating  systems,  internal  and  external  storage  conventions,  and  input/ 
output    characteristics.    The  system  design  and  the  component  source 
programs  of  the  CARL  System  have  been  developed  with  this  third  objective 
in  mind.    The  original  version  of  the  CARL  System  (described  in  this 
paper)  is  a  sequential  access  system,  as  distinguished  from  a  direct 
access  system.    Therefore,  the  data  files  which  will  be  described  below 
could  be  stored  either  on  magnetic  tape  or  on  disk.    Most  computer 
systems  include  one  or  both  of  these  storage  media.    The  source  programs 
which  incorporate  the  processing  logic  of  the  CARL  System  are  written  in 
ASCII  F0RTR.'V\\2    jhe  use  of  FORTRAN  should  insure  a  wide  degree  of 
compatibility  with  different  computers,  as  most  systems  support  FORTRAN.^ 

Input  Information 

The  information  for  each  new  reference  is  encoded  for  keypunching^ 
on  four  different  terms.    The  first  form  is  for  headers,  as  shown  in 
Figure  1.    The  header  card  allows  space  for  up  to  four  authors  (last  name 
and  first  and  middle  initials),  the  year  of  publication,  and  the  first 
letter  of  the  reference  title.    In  addition,  there  is  space  for  a  five- 
digit  reference  number  and  a  one-digit  card  number.    Zero  is  the  card 
number  for  the  header  card.    Each  reference  has  only  one  header  card, 
regardless  of  the  actual  number  of  authors.    The  header  card  is  used  for 
sorting  operations  in  the  CARL  System. 

2 

A  version  of  the  FORTRAN  language  which  handles  the  full  American 
Standard  Code  for  Information  Interchange  character  set. 

conputcr  system  used  in  the  development  of  the  CARL  System  is 
a  UNIVAC  1110,  a  general  purpose,  high  performance,  multiprocessor 
system  employing  the  EXEC  8  executive  system.    This  computer  system  is 
located  at  the  Naval  Ocean  Systems  Center  in  San  Diego,  California. 

^Actually,  there  are  no  syi>tem  constraints  which  require  card  input. 
Tlie  information  could  be  entered  via  a  computer  terminal. 
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CAKL  System-.  Headers 


The  second  form  is  used  for  the  text  of  the  reference  citation. 
There  are  no  system  constraints  on  the  arrangement  of  information  within 
this  text  card  (excluding  the  reference  number  and  card  number) . 
Figure  2  illustrates  the  conventions  adopted  by  the  present  author.  The 
first  line  of  text  begins  in  column  one  with  the  senior  author's  last 
name,  first  and  middle  initials.    This  is  followed  by  the  article  title, 
the  journal  name,  year,  volume  number,  and  page  numbers.    Note  that  the 
additional  text  cards  differ  from  the  first  text  card  in  that  inform- 
ation begins  in  column  four.    This  is  done  to  improve  readability  of 
lists  of  references.    The  second  example  in  this  figure  illustrates  the 
encoding  of  a  technical  report  published  by  one  of  the  military 
personnel  research  laboratories.    The  authors  and  title  are  formatted  as 
above,  followed  by  the  report^ type  and  number  assigned.    Next,  the 
location  and  the  name  of  the  performing  organization  are  specified, 
followed  by  the  publication  date.    The  text  cards  for  a  single  reference 
are  assigned  card  numbers  from  one  to  six,  as  needed. 

The  third  type  of  form  is  for  author  information.    As  shown  in 
Figure  3,  the  format  allows  for  four  authors  per  card.    If  there  are 

more  than  four  authors,  a  second  author  card  is  used.  All  author  cards 
are  assigned  the  number  seven  as  the  card  number. 

The  last  form  used  to  encode  input  data  is  for  keyivords.    As  shown 
in  Figure  4,  up  to  four  keywords  are  allowed  per  card.    The  number  of 
keyword  cards  for  a  single  reference  is  unlimited,  but  each  keyword  card 
is  assigned  eight  as  a  card  number.    The  only  system  constraint  on  key- 
words is  a  length  of  eighteen  characters.    The  present  author  has 
developed  a  controlled  vocabulary  for  use  in  assigning  keywords  to 
references.    The  choice  of  keywords  for  a  reference  is  the  most  crucial 
aspect  of  encoding  all  input  data.    If  many  keywords  having  only  a 
remote  relationship  to  the  reference  are  assigned,  the  reference  will  be 
retrieved  frequently  when  it  is  not  useful.    On  the  other  hand,  if  a 
reference  is  not  assigned  critical  keyi^ords,  it  will  be  missed  in  a 
reference  search  even  though  the  material  is  pertinent  to  the  user's 
needs.    The  best  balance  between  these  two  considerations  will  depend 
upon  the  typical  user  of  the  system.    The  present  author  leans  toward 
overinclusion  (i.e.,  too  many  keyi^ords) ,  to  insure  that  all  pertinent 
references  are  identified  in  a  search. 

New  References 

The  addition  of  new  references  to  an  existing  library  involves 
twelve  steps: 

1.  Alphabetize  a  new  set  of  source  documents  by  author(s)  and 
remove  any  duplicates. 

2.  Chock  new  source  documents  against  existing  library  to  identify 
apparent  duplicates. 

5.    Determine  unique/duplicate  status  of  potential  duplicates  and 
eliminate  identified  duplicates. 
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Figure  4,  Keyi^ord^  Keypunching  Form. 


4.  Code  information  on  new  documents  onto  keypunch  forms: 

A.  Header  form  for  Card  #0 

B.  Text  form  for  Cards  tfU6 

C.  Author (s)  form  for  Card  #7 

D.  Keyword (s)  form  for  Card  #8 

5.  Keypunch  new  information  (Cards  #0-8)  for  each  new  source 
document . 

6.  Verify  keypunching  of  new  information. 

.  7.  Sort  new  card  deck  by  document  number  and  card  number. 

8.  Run  edit  program  on  sorted  deck. 

9.  Correct  any  problems  identified  by  edit  program, 

10.  Proofread  edit  program  output: 

A.  Header 

B.  Text 

C.  Author(s) 

D.  Keyword (s) 

11.  Correct  any  problems  identified  by  proofreading. 

12.  Input  corrected  card  deck  and  update  system. 
Data  Files 

 The  CARL  System  data  files  may  be  divided  into  those  primary  files 

accessed  by  the  system  during  normal  px^ocessing  and  those  backup  files 
kept  as  insurance  against  disaster. 

There  are  four  primary  data  files  in  the  CARL  System. 5 

1.  RDXXXX. — The  raw  data  file  which  contains  all  the  raw  data  from 
the  header,  text,  author,  and  keyword  cards  for  each  reference  (i.e., 
cards  zero  through  eight),  arranged  sequentially  by  reference  number  and 
card  number. 

2.  KAXXXX. — The  keyword- author  file  which  contains  all  of  the  key- 
ivords  and  authors  with  the  associated  reference  numbers,  arranged  alpha- 
betically by  keyword/author. 


'The  XXXX  portion  of  each  data  file  name  is  a  number  indicating  the 
highest  reference  number  currently  incorporated  into  the  system.  For 
example,  RD0400.  would  indicate  a  data  file  ba.sod  upon  references  1-400, 
inclusive. 
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3.    KDXXXX.--The  keyii/ord  dictionary  file  containing  all  keywords 
employed  in  the  existing  library,  arranged  in  alphabetical  sequence. 


4.    ADXXXX.--The  author  dictionary  file  containing  all  the  authors 
employed  in  the  existing  library,  arranged  in  alphabetical  sequence. 

All  four  primary  data  files  are  backed  up  on  magnetic  tape. 
Specifically,  the  raw  data  file  (RDXXXX.)  and  the  keyword- author  file 
(KAXXXX.)  are  kept  on  one  reel  and  the  two  dictionary  files  (KDXXXX.  and 
ADXXXX.)  are  kept  on  another  reel.    Finally,  the  original  punched  cards 
are  saved  so  that  the  entire  CARL  System  could  be  rebuilt  if  all  the 
disk  files  and  magnetic  tapes  were  destroyed- 

Computer  Programs 

The  computer  programs  in  the  CARL  System  can  be  divided  into  two 
categories:    (1)  input  preparation  progrssss,  and  (2)  system  prcgrass. 
As  the  name  implies,  the  purpose  of  the  input  preparation  programs  is  to 
clean  up  the  input  data  prior  to  incorporating  it  into  the  CARL  System. 
The  EDIT  program  reads  the  punched  card  deck  (or  a  disk  file)  containing 
all  the  raw  data  and  prints  it  out  in  a  format  which  facilitates  visual 
editing.    In  addition,  the  program  checks  for  the  following  problems: 

(1)  cards  missing  or  not  in  sequence  within  a  refeijence,  (2)  reference 
nxjunbers  not  in  sequence,  (3)  missing  reference  numbers,  (4)  duplicate 
reference  numbers,  and  (5)  illegal  reference  numbers.  Appropriate 
diagnostic  messages  are  printed  as  error  conditions  are  encountered. 
Finally,  after  processing  all  the  input  data,  the  number  of  errors 
flagged  and  the  number  of  missing  references  are  reported. 

The  duplicate  identification  program  (DUFL)  reads  the  raw  data  file 
(RDXXXX.)  and  strips  off^all  zero  cards.    These  header  cards  are  then 
sorted  on  four  fields       the  foil  owing"  order  of  sigfiificahce:    (1)  author, 

(2)  publication  date,  (3)  first  letter  af  title,  and  (4)  reference 
ntmiber.    A  sorted  listing  of  the  header  cards  is  produced.  Optional 
outputs  include:  (1)  identification  and  listing  of  all  potential 
duplicate  references  ii.e.,  references  with  identical  header  cards)  and 
(2)  a  con5)arison  of  potential  duplicate  references  with  a  stored  list  of 
apparent  duplicates  which  have  been  previously  identified  as  unique 
references  and  a  listing  of  the  remainirig  potential  duplicates. 

The  recommended  keyword  dictionary  program  (RKIVD)  reads  a  punched 
card  deck  of  keywords,  arranges  them  in  alphabetical  order,  and  prints 
out  a  dictionary  of  reconmiended  keywords.  This  dictionary  is  designed 
to  provide  initial  guidance  for  indexers  as  a  system  is  getting  started 
and  includes  only  a  few  references.  Later,  after  a  substantial  number 
of  references  has  been  indexed  and  incorporated  into  the  system,  a 
dictionary  of  actual  keywords  will  be  used. 

There  are  three  systems  programs:     (1)  a  system  creation  program 
(BUILD),  C2)a  query-retrieval  program  (QUERY),  and  (3)  a  system  main- 
tenance program  (CHANGE) ,    The  BUILD  program  is  used  only  once  (assuming 
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the  data  files  in  the  CARL  System  are  not  destroyed).    As  shown  in 
Figure  5,  the  BUILD  program  reads  the  raw  data  either  from  the  original 
punched  card  deck  or  a  disk  file  containing  the  same  information.  The 
data  are  edited  and  if  an  abortive  error  condition  is  encountered  a 
message  is  printed  out  and  processing  terminates. 6    jf  abortive 
condition  is  found,  the  BUILD  program  creates  four  output  files:     (1)  the 
raw  data  file  (RDXXXX.),  (2)  the  keyword-author  file  (KAXXXX.),  (3)  the 
keyword-dictionary  file  (KDXXXX.),  and  (4)  the  author-dictionary  file 
(ADXXXX.).    Each  of  these  four  output  files  can  be  either  a  disk  file  or 
a  magnetic  tape  file,  depending  upon  the  computer  system  hardware  avail- 
able and  the  costs  of  different  storage  modes.    At  present,  a  good  config- 
uration appears  to  be  having  RDXXXX.  and  KAXXXX.  created  as  disk  files  and 
KDXXXX.  and  ADXXXX.  created  as  magnetic  tape  files.    The  first  two  data 
files  will  be  needed  almost  everytime  the  CARL  System  is  used  but  the 
other  two  files  are  needed  only  when  a  dictionary  (either  keyword  or 
author)  is  required  or  when  certain  types  of  corrections  are  being  made 
using  the  CHANCE  program.    Finally,  separate  printed  dictionaries  are 
produced  for  keywords  and  authors' 

As  shown  in  Figure  6,  the  query-retrieval  program  (QUERY)  uses  the 
keyword- author  file  (KAXXXX.)  to  respond  to  demand  terminal  queries  from 
a  user.    The  desired  keyword(s)  is  (are)  typed  on  a  terminal  by  the  user. 
The  QUERY  program  examines  the  KAXXXX.  data  file  to  identify  all  references 
with  the  appropriate  keyword(s) .    A  message  indicating  the  number  of  ref- 
erences located  is  sent  to  the  user  on  the  terminal.    The  user  is  given 
the  option  of  having  the  actual  reference  numbers  listed  or  the  entire 
text  of  the  reference  citation(s)  listed.    If  the  user  elects  to  have  the 
entire  reference  citation(s)  listed,  the  RDXXXX,  data  file  is  required. 
If,  on  the  other  hand,  the  number  of  references  initially  indicated  is  too 
many,  the  user  can  narrow  the  scope  of  the  search  by  specifying  additional 
key^^^ords . 

The  maintenance  program  (CHANGE)  requires  access  to  all  four  data 
files  (RDXXXX.,  KAXXXX.,  KDXXXX.,  and  ADXXXX,)  as  shown  in  Figure  7.  If 
the  change  desired  involves  one  or  more  corrections  to  the  existing  system, 
the  appropriate  data  files  are  updated.    If  the  maintenance  activity 
involves  adding  new  references  to  the  system,  new  raw  data  are  input  (from 
cards  or  disk),  certain  editing  checks  are  performed  and  all  four  data 
files  are  updated.    Finally,  the  operator  has  the  option  of  obtaining  a 
post-change  listing  of  all  references  changed  (or  added). 

Query-Retrieval  Example 

The  following  example  is  presented  to  illustrate  the  way  in  which  a 
user  would  interact  with  the  CARL  System.    After  contacting  the  person 
managing  the  system,  the  user  would  examine  a  controlled  dictionary  pf 
all  allowable  keywords.    This  would  enable  the  user  to  formulate  the 
retrieval  request  in  a  manner  v/hich  will  be  ineaningful  to  the  system. 


The  operational  definition  of  an  abortive  error  condition  can  be 
specified  by  the  system  manager. 
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Program  BUILD 


^    Start  ^ 
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Figure  5.    Program  BUILlpFlwchart. 
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Figure  7. .  Program  CHANGE  Flowchart. 
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Armed  with  the  legitimate  keywords  which  are  pertinent  to  the  topic,  the 
user  is  ready  to  interact  with  the  QUERY  program  of  the  CARL  System, 
illustrated  in  Figure  6.    After  accomplishing  the  log-on  procedures 
required  to  establish  contact  with  the  computer  system,  the  user  would 
enter  the  keyword(s)  on  a  demand  terminal.    The  QUERY  program  will  locate 
all  references  which  have  been  indexed  with  that  keyword,  or  combination 
of  keywords,  and  the  number  of  references  located  will  be  displayed  on 
the  terminal-    The  user  is  then  asked  if  a  list  of  reference  numbers  is 
desired  and,  if  so,  the  reference  numbers  are  printed  out  in  ascending 
sequence.    Next  the  user  is  provided  with  the  option  of  having  the  entire 
reference  citation  printed  out  for  each  reference  located  by  the  search. 
If  the  user  wishes  the  entire  reference  citations,  a  hardcopy  list  will 
be  produced  in  alphabetical  sequence  by  author.    Finally,  the  user  is 
presented  with  the  option  of  continuing  the  interaction  with  the  CARL 
System  or  terminating  the  session  and  logging-off  the  computer  system. 


CONCLUSIONS 

Coordinate  Indexing 

The  coordinate  indexing  approach  to  information  retrieval  allows 
considerable  flexibility.    The  source  documents  contained  in  the  personal 
library  system  are  not  limited  to  published  material  available  to  the 
public  as  is  the  case  with  commercial  reference  retrieval  systems. 
Lecture  notes  for  teaching,  documentation  for  computer  programs,  printed 
advert  is  ei..:ints,  equipment  brochures,  and  notes  used  to  present  briefings 
are  examples  of  the  diversity  which  can  be  incorporated  into  a  personal 
reference  system.    In  addition,  the  coordinate  indexing  approach  has 
considerable  generality.    For  example,  in  the  work  setting,  coordinate 
indexing  could  be  employed  in  a  management  information  system  (MIS) 
designed  to  provide  military  laboratory  managers  with  current  information 
on  all  on-going  research    nd  development  projects.    The  source  document 
for  this  MIS  could  be  DD-i498  Forms.     In  a  home  setting,  coordinate 
indexing  could  be  used  to  create,  query-retrieve,  and  maintain  files  on  a 
minicomputer.    These  home  files  might  contain  a  photography  collection  of 
prints  or  slides  or  a  record  collection  of  albums  or  tapes.  Unquestion- 
ably there  are  many  examples,  both  in  the  office  and  the  home,  where  an 
information  retrieval  system  employing  coordinate  indexing  could  be  quite 
useful. 

Future  Work 

Eventually  a  disk-oriented  version  of  the  CARL  System  will  be 
created.    This  second  version  of  the  CARL  System  will  use  direct  access 
methods  as  opposed  to  the  sequential  access  methods  used  by  the  original 
CARL  System  described  herein.    The  direct  access  version  will  require 
considerably  more,  programming  time  to  develop  than  the  sequential  access 
version.    However,  once  developed  and  debugged,  the  direct  access  version 
should  significantly  speed  up  the  information  retrieval  process  while 
simultaneously  providing  a  substantial  reduction  in  coniputor  costs. 
Further,  the  advantage  of  a  direct  access  system  over  a  sequential  access 
system  will  increase  as  the  number  of  references  in  the  system  increases. 
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The  possibility  of  a  CARL  System  Network  will  be  given  consideration. 
This  network  would  allow  individual  researchers  to  have  access  to  the 
personal  reference  libraries  of  other  researchers,  in  a  mutually  agreed 
upon  fashion,  thereby  increasing  the  number  of  references  examined  for 
any  query. 
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The  Army  Reserve  Officers'  Training  Program  (ROTC)  and  the  other 
officer  procurement  programs  have  to  produce  a  sufficient  number  of 
officers  to  meet  the  requirements  of  the  Army  active  and  reserve 
components.    Projections  of  future  requirements  appear  to  indicate  that 
the  ROTC  program  will  need  to  double  its  number  of  graduates  x^ithin 
the  next  few  years  in  order  to  remain  responsive  to  this  need.    As  a 
consequence,  the  Professors  of  Military  Science  have  been  striving 
diligently  to  meet  this  objective  by  enrolling  increasing  numbers  of 
students  in  the  ROTC  program.    As  in  any  personnel  selection  this,  in 
turn  necessitates  a  system,  it  is  necessary  to  have  a  more  stringent 
evaluation  of  the  quality  of  accessions  as  more  emphasis  is  being 
placed  on  quantity. 

The  objective  of  this  research  was  to  evaluate  the  quality  of 
ROTC  graduates  and  to  determine  if  there  were  differences  among  ROTC 
graduates  in  performance  on  the  basis  of  sex  or  on  the  basis  of  the 
geographical  region  in  which  the  ROTC  units  are  located-    The  criterion 
was  the  final  course  grades  in  the  Officer  Basic  Courses  (OBC)  of  the 
13  Career  Branches. 


Procedure 

A  sample  of  1,243  officers  who  completed  Officer  Basic  Course  in 
the  first  and  second  classes  after  15  June  1977  were  used  in  this 
research.    In  addition,  a  sample  of  4,662  officers  who  continued  on 
active  duty  after  completion  of  OBC  in  Fiscal  Year  1974  were  selected 
from  a  total  of  9,180  officers  who  entered  on  active  duty  during  that 
year. 


The  views  expressed  in  this  'paper  are  those  of  the  author  and  do  not 
necessarily  reflect*  the  views  of  the  Army  Research  Institute  or  the 
Department  of  the  Army.  ^ 

489        5  ^  ^ 


Each  sample  was  divided  on  the  basis  of  sources  of  conunisslon,  ROTC, 
USM^,  OCS,  and  direct  appointmento.    The  ROTC  samples  were  divided  into 
those  officers  who  were  ROTC  scholarship  recipients  and  those  who  were 
not.    The  ROTC  sample  for  1977  was  divided  into  geographical  regions 
corresponding  to  the  location  of  the  ROTC  institution  that  they 
attended.    Finally,  the  1977  ROTC  graduates  were  divided  on  the  basis 
of  sex. 


Results  and  Discussion 

The  means  of  the  four  groups  of  officers  from  the  four  procure- 
ment programs  are  shown  in  Table  1  for  the  1977  sample.    Also,  the 
means  of  the  different  subgroups  are  presented  (i.e.,  ROTC  Scholarship 
recipients,  ROTC  region  and  male  and  female  samples.). 

The  results  of  the  analysis  of  variance  among  the  four  procurement 
programs  was  significant.    The  average  final  OBC  grades  for  the  four 
procurement  programs  ranked  as  follows:    U.  S.  Military  Academy,  ROTC, 
OCS,  and  direct  appointments.    Even  though  the  U.  S.  Military  Academy 
graduates  were  favored,  a  meaningful  difference  in  mean  performance 
for  that  group  and  for  ROTC  graduates  was  not  obtained.    When  the  ROTC 
graduates  are  classified  as  ROTC  scholarship  recipients  and  non-recip- 
ients, the  mean  Officer  Basic  Course  final  grades  of  the  graduates  of 
the  different  programs  ranked  as  follows:    U.  S.  Military  Academy 
graduates,  ROTC  scholarship  recipients,  non-recipients  of  ROTC  scholar- 
ships, OCS  graduates,  and  direct  appointments. 

When  an  analysis  of  variance  was  performed  to  detect  differences 
among  the  four  ROTC  regions,  a  significant  difference  was  obtained. 
The  Western  Region  was  favored  in  terms  of  average  OBC  final  grades 
earned  while  the  South  Central  Region  had  the  lowest  average  perfor- 
mance.   A  significant  difference  did  not  exist  between  the  mean  per- 
formance of  male  ROTC  graduates  on  the  criterion  measure. 

ROTC  units  who  had  five  or  more  graduates  in  the  1977  sample 
were  ranked  on  the  basis  of  average  OBC  final  grades.    Of  the  70  ROTC 
units  ranked,  the  average  OBC  final  grades  of  18  of  the  70  institutions 
so  ranked  exceeded  that  of  the  average  OBC  final  grades  of  graduates 
of  the  U.  S.  Military  Academy.    The  average  OBC  performance  graduating 
of  50  of  the  ROTC  institution  ^ceeded  that  of  the  average  performance 
of  OCS  graduates  while  the  anerage  performance  of  graduates  of  54  ROTC 
institutions  exceeded  that  a£  the  average  performance  of  those  officers 
who  received  direct  appointmests. 

The  means  of  the  Officer  Basic  Course  final  course  grades  for 
each  of  the  four  procurement  programs  in  the  Fiscal  Year  1973  sample 
are  shown  in  Table  2  as  well  as  the  mean  performance  of  ROTC  scholar- 
ship recipients  and  non-recipients.    Results  of  analysis  of  variance 
revealed  a  significant  difference  in  performance  among  the  four  groups. 
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TABLE  1 


MEANS  OFFICER  BASIC  COURSE  FINAL  GRADES 
FOR  THE  DIFFERENT  GROUPS  IN  THE 
1977  SAMPLE  OF  OFFICER  ACCESSIONS 


Group 

N 

X 

U»  S*  Military  Academv 

IIJ 

106 .90 

ROTC 

871 

100.34 

OCS 

132 

96.22 

Direct  Appointment 

61 

94.37 

Total 

1,177* 

100.20 

ROTC  Scholarshin  RecinipnfR 

105.81 

Non-Recipients 

96.72 

Male  KUiL  Graduates 

814 

100.48 

Female  ROTC  Graduates 

57 

98.30 

Eastern  ROTC  Region 

341 

98.13 

North  Central  ROTC  Region 

188 

101.99 

South  Central  ROTC  Region 

155 

96.83 

Western  ROTC  Region 

170 

106.14 

1,243  cases  were  not  used  due  to  missing  data  elements. 
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TABLE  2 


MEANS  OF  THE  DIFFERENT  PROCUREMENT 
PROGRAM  GROUPS  IN  THE  FY  1974  SAMPLE 


Group 

N 

X 

U.  S.  Military  Academy 

591 

99.04 

ROT'? 

1,721 

100.80 

OCS 

113 

106.10 

Direct  Appointment 

76 

95.71 

Total 

2,501 

100.47 

ROTC  Scholarship  Recipients 

598 

102.54 

Non-reclplents 

1,123 

99.87 
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Graduates  of  OCS  were  favored  over       S.  Military  graduates  and  ROTC 
graduates  while  those  officers  who  received  direct  appointments  had  the 
lowest  mean  OBC  final  course  grades. 

Those  ROTC  units  who  had  five  or  more  graduates  in  the  Fiscal  Year 
1975  sample  were  rank-ordered  on  the  mean  Officer  Basic  Course  final 
grades  of  the  graduates.    Inspection  of  these  means  revealed  the  means 
performance  of  106  of  the  235  institutions  so  ranked  exceeded  that  of 
the  mean  performance  of  U.  S.  Military  Academy  graduates.    The  average 
performance  of  graduates  of  37  ROTC  institutions  exceeded  the  average 
performance  of  OCS  graduates  in  this  sample  while  the  average  per- 
formance of  105  ROTC  institutions  exceeded  the  average  OBC  performance 
of  officers  who  received  direct  appointments. 

For  the  ROTC  institution  and  that  had  five  or  more  graduates  both 
in  the  1977  sample  and  in  the  Fiscal  Year  1974  sample,  the  mean  per- 
formance of  the  graduates  were  ranked  within  each  sample.    A  Spearman 
rank  order  correlation  coefficient  was  computed  between  the  obtained 
values.    The  resulting  correlation  coefficient  of  .53  between  these 
rankings  was  significant  at  the  .01  level. 

The  results  of  this  research  indicate  that  the  ROTC  program  is 
producing  a  quality  of  graduates  whose  performance  in  the  Officer  Basic 
Course  is  of  comparable  quality  with  other  officer  procurement  programs. 
There  appears  to  be  a  variability  among  the  ROTC  institutions  in  terms 
of  the  performance  of  graduates  in  Officer  Basic  Course  but  even  so, 
the  ROTC  is  meeting  its  objective  of  obtaining  quality  accessions  for 
the  officer  corps.    There  appears  to  be  a  certain  tendency  for  ROTC 
institutions  that  have  produced  graduates  in  Fiscal  Year  1974  who  per- 
formed well  in  Officer  Basic  Courses  to  do  so  again  in  1977. 
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Prediction  of  Reading  Grade  Levels  of  Service  Applicants  from 
Armed  Services  Vocational  Aptitude  Battery  (ASVAB) 

John  J*  Mathews  and  Lonnle  D.  Valentine »  Jr. 
Personnel  Research  Division 
Brooks  Air  Force  Base,  Texas  78235 

Wayne  S.  Sellman,  Major,  USAF 
Air  Force  Manpower  and  Personnel  Center 
(Research  and  Measurement  Division) 
Randolph  Air  Force  Base,  Texas  78148 


Background 

The  G^eral  Accounting  Office  (GAC)  subsnltted  a  report  dated 
31  March  1977  to  the  Secretary  of  Defense  entitled  '*A  Need  to  Address 
Illiteracy  Problems  In  the  Military  Services.**    Among  other  things. 
It  recommended  that  the  Department  of  Defense  develop  a  policy  to 
address  the  Illiteracy  problem  and  have  the  Services  (1)  determine  the 
reading  grade  level  required  for  each  military  occupation,  and  (2) 
establish  an  overall  minimum  reading  level  required  for  enlistment. 

In  a  10  June  1977  letter  to  the  GAO,  the  Assistant  Secretary  of  . 
Defense  (Manpower,  Reserve  Affairs,  and  Logistics)  concurred  In 
general  with  the  findings  of  the  report  (I.e.,  Illiterate  service 
personnel  do  have  higher  discharge  rates,  do  experience  more  difficulty 
In  training,  and  do  have  less  potential  for  career  advancement)  but 
Indicated  that  DOD*s  mission  did  not  Include  the  societal  responsibil- 
ity for  remedying  any  deficiencies  In  the  American  educational  system. 
Subsequent  to  the  10  June  1977  letter,  other  Initiatives  surfaced 
which  were  directly  related  to  the  Illiteracy  problem.    The  House  and 
Senate  Defense  Appropriations  Committees  expressed  concern  about  In- 
service  high  school  completion  programs  and  the  potential  Impact  of 
continuing  to  attempt  to  correct  educational  deficiencies  of  enlistees 
after  they  enter  the  Se:rvlce.    The  Committees  believed  Instead  that  a 
more  efficient  approach  would  be  for  potential  enlistees  with  educa- 
tional weaknesses  to  receive  basic  skills  training  prior  to  enlistment. 
Accordingly,  the  Secretaries  of  Health,  Education,  and  Welfare  (HEW) 
and  Labor,  In  coordination  with  the  Secretary  of  Defense,  were 
requested  to  develop  such  a  basic  skills  program. 


Introduction 

The  result  of  these  Initiatives  was  Increased  OSD  emphasis  on  the 
Services*  literacy  programs.    In  that  regard  the  Principal  Deputy 
Assistant  Secretary  of  Defense  XManpower,  Reserve  Affairs,  and 
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Logistics)  directed  by  memorandtim,  dated  18  October  1977,  that  a 

study  be  conducted  to  evaluate  the  capability  of  the  Armed  Services 
Vocational  Aptitude  Battery  (ASVAB)  to  determine  che  reading  ability 
skills  of  applicants  for  enlistment  at  the  Armed  Forces  Examining  and 
Entrance  Stations  (AFEESs).*'    It  was  believed  that  because  of  its 
highly  verbal  content,  the  ASVAB  already  indirectly  measured  reading 
ability.    If  that  was,  in  fact,  the  case,  most  applicants  with  low 
reading  skills  were  already  being  screened  out.    In  addition,  if  a 
reading  grade  index  could  be  derived  froin  ASVAB  j  estimates  of  appli* 
cants'  reading  skills  could  be  provided  to  Labor  and  HEW  representa- 
tives involved  in  the  programs  alluded  to  above. 

Thus,  the  specific  objectives  of  this  study  were  to  assess  the 
reading  ability  of  applicants  for  military  service  as  well  as  for 
actual  accessions  and  to  determine  the  relationship  between  ASVAB 
measures  (Jensen,  Massey,  &  Valentine,  1976)  and  reading  scores. 
Depending  on  the  magnitude  of  the  relationship,  an  appropriate  combi- 
nation of  ASVAB  subtests  could  be  used  to  estimate  the  reading  grade 
level  of  groups  of  applicants  and  possibly  to  predict  within  a 
reasonable  confidence  interval  the  reading  grade  level  of  Individuals. 
The  present  report  concerns  analyses  involving  two  reading  tests. 
Additional  data  covering  two  other  reading  tests  will  be  presented  in 
a  subsequent  report. 


Method 

Subjects 

The  study  plan  called  for  testing  6,000  service  applicants 
divided  among  25  geographically  dispersed  AFEESs.    Four  reading  tests 
were  administered,  the  Gates-MacGinitie,  Nelson -Denny,  Basic  Skills 
Assessment,  and  Literacy  Assessment  Battery,  with  each  subject  taking 
two  of  the  tests.    TMs  report  concerns  all  subjects  given  the  Gates- 
MacGinitie  tests  and  a  subsample  who  were  aleo  given  the  Nelson-Denny 
test.    In  March-April  1978,  2,899  applicants  were  given  the  Gates- 
MacGinitie  test,  and  ASVAB  scores  obtained  for  2,432  of  these.  The 
first  sample  consists  of  2,033  of  thf^  2,432  for  whom  sufficient 
identification  was  available  from  reading  and  ASVAB  data  sources  to 
obtain  accurate  matches,  and  for  whom  most  other  data  of  interest 
(e.g.,  sex,  race,  education)  was  also  valid.    A  subsample  consists  of 
818  of  the  2,033  who  were  given  the  Nelson-Denny  reading  test  in 
addition  to  the  Gates-MacGinitie.    The  second  sample  includes  212 
subjects  who  took  the  Gates-MacGinitie  and  Nelson-Denny,  but  for  whom 
no  ASVAB  data  were  available.    Reading  data  for  these  was  compared  to 
that  for  the  818  to  detect  possible  bias  in  the  samples. 
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Predictors 


Aa  Applicant  Processing  Worksheet  was  available  for  most  of  the 
subjects.    ASVAB  subtest  scores  and  Armed  Forces  Qualification  Test 
(AFQI)  percentiles  weie  obtained  from  these  documents •    Other  analysis 
variables  from  the  worksheets  Included  military  service  applied  for» 
educational  level »  race^  sez»  and  service  qualification  status-- 
qualification  being  a  function  of  an  applicant's  meeting  specified 
mlnlmnn  ASVAB  and  educational  criteria.    Sample  percentages  for  demo- 
graphic variables  are  In  Appendix  A. 

Criteria 

The  reading  tests  Involved  In  this  report  were  the  Gates- 
MacGlnltle  Reading  Tests  Survey  D  (Gates  &  HacGlnltle»  1965)  and  the 
Nelson-Denny  Reading  Test  Form  C  (Brown »  1973).    The  order  of  adminis- 
tration of  these  tests  was  counterbalanced.    Both  tests  contain  a 
vocabulary  and  a  reading  comprehension  subtest  which  were  separately 
scored.    The  published  test  norms  were  used  to  convert  the  reading 
test  raw  scores  to  reading  grade  level  scores. 

Statistical  Methods 

Statistical  analyses  Included  multl-varlate  distributions  rnd 
correlation  matrices.  Due  to  a  difference  In  range  and  distributions » 
reading  grade  levels  for  the  two  reading  tests  have  been  summarized 
In  most  Instances  by  use  of  medians  rather  than  means*    The  best 
combinations  of  ASVAB  subtests  for  predicting  reading  levels  was 
determined  via  Btultlple  regressions. 


Results  and  Discussion 

Percentages  of  service  applicants  scoring  at  each  reading  grade 
level  as  measured  by  the  Gates-MacGlnltle  test  are  shown  on  the  right 
side  of  Table  1 .    The  reading  grade  level  range  of  Gates-MacGlnltle 
which  Is  targeted  at  4th-6th  grades  Is  from  2  to  11.    The  top  reading 
grade  level /labeled  'Ml  &  above,**  contains  the  largest  proportion 
of  applicants,  565  or  27.8%  of  2,033.  About  7.8%  obtained  reading  grade 
levels  below  four.    The  median  reading  grade  level  of  applicants  was 
9.0. 

Due  primarily  to  aptltudlnal  and  educational  screening  standards 
employed  by  services,  the  reading  grade  levels  of  examinees  meeting  the 
qualification  standards  of  the  service  for  which  tested  were  usually 
higher  than  those  of  examinees  who  did  not  qualify.    The  median  reading 
grade  level  of  applicants  qualifying  for  services  was  10.2  conq>ared  to 
5.7  for  non -qualifying  applicants. 
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Since  each  service  has  different  screening  standards  and  uses 
different  combinations  of  abilities,  the  aptitude  and  education 
distributions  vary  across  services  for  applicants  and  especially  for 
accessions.    This  is  reflected  In  relatively  higher  reading  grade 
levels  for  Air  Force  and  Navy  applicants  than  for  Army  and  Marine  Corps 
applicants.    As  indicated  in  Table  1,  the  median  reading  grade  level 
for  applicants  qualifying  for  the  Air  Force  was  10.9  and  the  median 
reading  grade  level  for  those  qualifying  for  the  Navy  was  10.5,  vrtille 
the  median  resding  grade  level  for  Arcy  snd  Marine  Corps  qualified 
applicants  was  9.3  each. 

The  impact  of  completion  of  high  school  on  reading  grade  level 
can  be  seen  in  Table  2  which  gives  percentages  of  graduates  and  non- 
graduates  at  each  reading  grade  level.    The  median  reading  grade  level 
for  high  school  graduates  was  9.8  compared  to  7.9  for  high  school  non- 
graduates.    The  effect  of  aptitude  screening  on  reading  grade  level  is 
also  evident  from  data  in  Table  2.    High  school  graduates  who  qualified 
for  services  had  a  median  reading  grade  level  of  10.6  while  high  school 
graduates  who  did  not  qualify  had  a  median  reading  grade  level  of  6.1. 

The  Armed  Forces  Qualification  Test  (AFQT)  which  is  used  for 
preliminary  screening  by  all  services  was  correlated  with  the  Gates- 
MacGlnitie.    The  correlation  (r)  between  AFQT  percentiles  and  reading 
grade  level  was  .74.    For  the  Black  applicants  in  the  sample  (N  »  835) 
the  r  was  .68  (race  and  sex  distributions  of  reading  grade  level  appear 
in  Appendix  B) .    To  gauge  the  magnitude  of  this  relationship,  the 
construct  validity  and  reliability  of  the  Gates -MacGlni tie  and  the 
reliability  of  AFQT  must  be  considered.    Due  to  less  than  perfect 
reliability  of  these  measures,  their  maximum  Intercorrelatlon  would  be 
less  than  one. 

Data  for  a  subsample  of  the  2,033  who  had  also  taken  the  Nelson- 
Denny  reading  test  (N  «  818)  was  analyzed  for  additional  information. 
The  818  appeared  to  be  representative  of  the  2,033,  with  mean  Gates- 
MacGlnitie  reading  grade  levels  of  8.6  and  8.4,  respectively,  and  a 
common  Standard  Deviation  of  2.8. 

The  Nelson -Denny  has  a  reading  grade  level  range  of  from  6  to  15 
and  is  targeted  at  about  the  11th- 13th  grades.    Table  3  contains  com- 
parable data  for  samples  for  which  Gates-MacGlnitie  and  Nelson-Denny 
data  were  analyzed.    The  median  reading  grade  level  for  Nelson-Denny 
was  9,5  compared  to  9.0  for  Gates-MacGlnitie.    While  32.4%  of  applicants 
had  Gates-MacGlnitie  reading  grade  levels  of  six  or  less,  only  10.8%  of 
applicants  had  Nelson-Denny  reading  grade  levels  of  six  or  less.  The 
mean  AFQT  percentile  of  those  with  reading  grade  levels  of  six  or  less 
was  25.5  for  Gates-MacGlnitie  and  31.9  for  Nelson-Denny.    The  correla- 
tion between  Nelson-Denny  reading  grade  level  and  AFQT  was  .65  compared 
to  the  £  of  .74  between  Gates-MacGlnitie  and  AFQT  (intercorrelatlons  of 
reading  tests,  AFQT,  and  selected  ASVAB  subtests  are  listed  in  Table  4.) 
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The  r  between  the  average  of  Gates -MacGlnl tie  and  Nelson-Denny  reading 
grade  levels  and  APQT  was  .76, 

The  Intercorrelatlon  between  Gates  "MacGlnltle  and  Nelson ^Denny 
reading  grade  levels  was  •69.    If  these  tests  are  measuring  the  same 
ability  (reading) »  then  AFQT  Is  also  measuring  reading  with  comparable 
precision  since  AFQT  correlates  to  about  the  same  degree  with  Gates* 
MacGlnltle  and  Nelson-Denny  as  these  reading  tests  do  with  each  other. 

AFQT  Is  not  the  best  ASVAB  measure  of  either  reading  grade  level » 
however.    Not  surprisingly,  the  ASVAB  subtest  with  the  highest  relation 
3hlp  to  reading  scores  was  Word  Knowledge  (WK) .    This  vocabulary  test 
correlated  .73,  .69,  and  .78  with  Gates -MacGlnltle,  Nelson-Denny,  and 
the  average  of  the  two  reading  grade  levels,  respectively.    Of  the 
other  two  subtests  (besides  HK)  which  form  the  AFQT,  Arithmetic 
Reasoning  (AR)  correlated  substantially  higher  with  reading  grade  level 
than  did  Space  Perception  (SP) .    The  jc  between  AR  and  average  reading 
grade  level  was  •62 ^  compared  to  .35  between  SP  and  average  reading 
grade  level.    This  Indicates  that  a  composite  of  WK  and  AR  (the  General 
Technical  composite  used  by  Army  and  Navy,  and  the  General  composite 
used  by  Air  Force)  would  be  an  even  more  valid  predictor  of  reading 
grade  level  than  AFQT.    The  General  Technical  composite  (GT)  correlated 
.76,  .68,  and  .79  with  Gates-MacGinitle,  Nelson-Denny,  and  average 
reading  grade  ievels,  respectively.    Compared  to  the      of  .76  between 
AFQT  and  average  reading  grade  level,  GT  accounts  for  about  8%  more 
variance  in  reading  grade  levels  than  does  AFQT. 

Based  on  multiple  correlation  (R*s),  the  best  two  ASVAB  subtest 
coinblnatlon  for  predicting  both  reading  tests  consisted  of  WK  and 
Numeric  Operations  (NO),  a  clerical  speeded  subtest.    The  R*s  of  WK 
and  NO  were  .77,  .75,  ^d  .83  with  Gates-MacGlnltle,  Nelson-Denny,  and 
average  reading  grade  levels,  respectively.    The  three  ASVAB  subtest 
combination  ^^ch  correlated  highest  with  reading  grade  levels  Included 
General  Science  (GS) .    The  R»s  of  WK,  NO,  and  GS  with  Gates-MacGlnltle, 
Nelson-Denny,  and  average  reading  grade  level  were  .80,  .77,  and  .86. 

The  choice  among  commercial  reading  tests  and  some  combination  of 
ASVAB  measures  as  optimal  for  estimating  reading  grade  levels  of 
service  applicants  should  be  basad  on  considerations  involving  fair- 
ness, difficulty  levels,  and  administrative  considerations  as  well  as 
validity  and  reliability.    The  reading  tests  (Gates -MacGlnltle  + 
Nelson-Denny)  correlated  slightly  higher  with  race  than  did  AFQT 
(-.44  vs.  -.37).    Minorities  did  relatively  less  well  on  both  reading 
tests  than  on  APQT.    Gates -MacGlnltle  plus  Nelson-Denny  also  had  a 
higher  r  with  the  dlchotomous  variable  sex  than  did  AFQT  (.19  vs.  .10). 
Females  scored  higher  on  both  AFQT  and  reading  tests,  but  this  sex 
difference  was  less  on  AFQT. 

Regarding  difficulty  levels,  the  form  of  Gates-MacGlnitie  used 
would  be  appropriate  for  minimum  cutoff  scores  around  4th- 6th  reading 
grade  levels.    However,  Gates -MacGlnltle  would  be  too  easy  for  cutoffs 
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at  the  9th  reading  grade  level  (used  by  the  Air  Force)  or  for  accurate 
estimates  of  group  reading  grade  levels  since  the  median  of  service 
accessions  was  only  one  grade  lower  than  the  top  Gates-MacGinltie 
reading  grade  level.    The  Nelson-Denny  form  used  would  be  too  difficult 
for  use  for  cutoffs  around  the  4th-6th  reading  grade  levels  since  the 
sixth  grade  was  the  lowest  Nelson-Denny  reading  grade  level.  The 
ASVAB  was  developed  for  the  service  applicant  population.    The  mean 
Item  difficulty  level  (proportion  of  examinees  correctly  answering 
lucuuo/  is  about  .  6  on  AFQT  and  (^1  (unccrrectBd  for  gucSslng) . 

From  an  administrative  standpoint,  the  easiest  way  to  obtain 
estimates  of  reading  grade  level  would  be  currently  used  ASVAB  com- 
posites (AFQT  or  GT) .    An  unweighted  combination  of  ASVAB  subtests 
(such  as  WK  +  GS  +  NO)  would  be  somewhat  less  convenient  and  probably 
not  much  more  valid.    A  weighted  composite  of  WK  +  GS  +  NO  would  give 
a  somewhat  better  estimate  of  reading  grade  level,  but  would  require 
additional  computations.    A  reading  grade  level  index  computed  from 
ASVAB  could  be  used  to  tailor  basic  skills  remediation  programs  to  the 
reading  levels  of  their  referrals. 

The  sample  of  818  taking  the  Gates -MacGlnitie  and  Nelson-Denny 
tests  was  compared  to  212  who  also  took  these  tests  but  for  whom  no 
ASVAB  data  were  available.    It  had  been  speculated  that  many  of  those 
without  ASVAB  data  were  of  marginal  aptitude  and  did  not  return  to  take 
the  ASVAB  after  doing  poorly  on  the  reading  tests.    This  was  not  the 
case,  however,  as  the  mean  average  reading  grade  level  was  slightly 
higher  for  the  212  than  for  the  818  (9.8  vs.  9.4). 


Conclusion^ 
The  main  findings  of  this  study  were; 

1.  The  median  reading  grade  level  for  service  applicants  was  9.0 
based  on  Gates -MacGlni tie  and  9.5  based  on  Nelson-Denny.    The  median 
Gates-MacGln:^  tie  reading  grade  level  of  applicants  who  qualified  for 
services  was  10.2  compared  to  5.7  for  non-qualified  applicants. 

2.  The  AFQT  correlated  .74  with  Gates-MacGlnitie,  .65  with 
Nelson-Denny,  and  .76  with  average  reading  grade  levels,  respectively. 
Since  the  intercorrelation  of  Gates -MacGlni tie  and  Nelson-Denny  was 
.69,  AFQT  appeared  to  measure  reading  as  well  as  the  reading  tests. 
The  GT  composite  (General  AI  for  Air  Force)  correlated  .79  with  average 
reading  grade  level. 

3.  The  multiple  correlations  between  the  three  ASVAB  subtest 
combination  of  WEC,  GS,  and  NO,  and  the  Gates -MacGlni  tie,  Nelson-Denny, 
and  average  reading  grade  levels  were  .80,  .77,  and  .86,  respectively. 

4.  ASVAB  is  presently  screening  out  most  applicants  with  marginal 
literacy  skills. 
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Recommendations 


The  GT  composite  of  ASVAB  should  be  used  as  an  Index  of  reading 
grade  level.    A  conversion  table  can  be  developed  for  predicting 
reading  grade  levels  from  GI  scores* 
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Table  1.  Percentages  of  Qualified  and  Not  Qualified  Applicants  by  Service 
at  Each  Gates-MacGinitie  Readir-  Grade  Uvel 
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Table  2 


Percentage  of  High  School  Graduates  and  non-Graduates  at  Each 
Gates-MacGinitle  Reading  Grade  Level 
by  Quallfied/non-Qualifled 
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Table  3.    Comparison  of  Reading  Grade  Luvel  and  AFOT  for 
Gatcs-MacGiiiitie  (N  =  2,033)  and 
Nelson-Denny  (N  =  818)  Samples 
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Table  4 

Heans,  Standard  Deviations,  and  Interconelations  of  Variables  for  Gates-MacGinitle  + 

'  818) 


1.16 

.37 

1.00 

-.01 

.20 

.10 

.11 

.19 

.14 

.02 

-.11 

.04 

.15 

.20 

.19 

1.43 

.55 

-.01 

4     A  A 

1.00 

.02 

-.37 

-.38 

-.30 

-.34 

-.35 

-.22 

-.37 

-.41) 

-.40 

-.44 

J  fiQucaiion 
Level 

11.58 

1.27 

.20 

.02 

1.00 

.28 

.29 

.25 

.30 

.24 

.06 

.31 

.23 

.36 

.32 

4  AFQI 

Percentile 

50.06  22.52 

.10 

-.37 

.28 

1.00 

.94 

.57 

.88 

.82 

.62 

.73 

.74 

.65 

.76 

J  ui 

Percentile 

54.33  27.20 

.11 

-.38 

.29 

.94 

1.00 

.58 

.94 

•  Do 

.76 

.69 

^  A 

.79 

fi  NO 

U  ill/ 

30.56  10.19 

1  A 

.19 

-.30 

.25 

.57 

.58  1,00 

.49 

.58 

.28 

.49 

.58 

.59 

.64 

71 

18.63 

6.71 

.14 

-.34 

.30 

.88 

.94 

.49 

1.00 

.62 

71 
til 

./o 

8AR 

11.47 

4.30 

.02 

-.35 

.24 

.82 

.83 

.58 

1  nn 
XfUu 

CO 

•  JO 

/A 

.60 

.54 

.62 

12.0^ 

3.92 

-.11 

-.22 

•  UQ 

.38 

.,?8 

/  A 

.40 

4  AA 

1.00 

.43 

.39 

.25 

.35 

lOGS 

10.31 

3.91 

.04 

-.37 

.31 

.73 

.68 

.49 

.71 

.58 

.43 

1.00 

.70 

11  Cates- 

HacGinitie 

Reading  Grade 

Level 

8.60 

2.82 

.15 

-.40 

.23 

.74 

.76 

,58 

.73 

.60 

.39 

,70 

1.00 

.69 

.92 

12  NeljBon- 

Grade  Level  10.09 

2.73 

.20 

-.40 

.36 

.65 

.69 

.59 

.69 

.54 

.25 

.67 

.69 

1.00 

.92 

13  Average 

Reading  Grade 

LeveP 

9.37 

2.55 

,19 

-.44 

.32 

.76 

.79 

.64 

.78 

.62 

.35 

.74 

.92 

.92 

LOO. 

^Hale  ■  1,  Female  »  2 
^Caucasian  -  1,  Minority  ■  2 


Average  of  Gates-HacGinitie  and  Nelson-^  Reading  Grade  Uveis  fo: 


each  subject. 


APPENDIX  A 


Frequency  Distributions  of  Variables  for  Gates-MacGinitie  Sample 
(N  -  2,033)  and  Nelson-Denny  Subsample  (N  =  818) 


Gates-MacGinitie  Gates-McGinitie  +  Nelson-Denny 

 Sample  Subsample  

 N  %  N  %  

Service 

.  Army  851  41.9  371  45.4 

Navy  507  24.9  187  22.9 

Air  Force  472  23.2  195  23.8 

Marine  Corps  203  10.0  65  8.0 

Race 

White  1,198  58.9  508  62.1 

Black  835  41.1  310  37.9 

Sex 

Male  1,652  81.3  688  84.1 

Female  381  18.7  130  15.9 

Qual.  Status 

Qualified  1,459  71.8  645  78.9 

Not  Qualified  574  28.2  173  21.1 

AFEES 

Atlanta  273  13.4                      273  33.4 

Boston  27  1.3 

Cincinnati  175  8.6 

Dallas  271  13.3                      271  33.1 

Fresno  89  4.4 

Indianapolis  196  9.6 

Jacksonville  35  1.7 

New  Orleans  193  9.5 

Oklahoma  City  189  9.3                     189  23.1 

Philadelphia  446  21.9 

Pittflft)urgh  85  4.2  85  10.4 
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APPENDIX  B 


Percentages  of  i^pllcants  at  Each  Gates-MacGlnltle  Reading  Grade 

Level  by  Race  and  Sex 


Reading  Grade 


x«evei 

LTU4 

wnlte 

Black 

Male 

Female 

11  a  aoove 

Jo.o 

12.0 

26.3 

i!>  •  H 

7.0 

11.6 

13 .2 

10.  7 

7.  7 

9.0 

12  •! 

ft.Sl  o 

0"0  •  y 

fi  /• 

O.  H 

0.5 

8.2 

9.7 

/"/  •  y 

11  .  y 

10.0 

6-6.9 

6.8 

11.9 

9.4 

6.6 

5-5.9 

4.9 

15.6 

9.8 

7.4 

4-4.9 

3.2 

11.1 

7.4 

2.4 

3-3.9 

2.3 

7.1 

4.8 

1.8 

2.9  &  below 

1.5 

6.6 

4.3 

0.5 

Total  percent 

100 

100 

100 

100 

Median  Reading 
Grade  Level 

10.3 

6.8 

8.6 

10.0 

Total  N 

1,198 

835 

1,652 

381 

550 
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The  Content  Issue  In  Performance  Appraisal  Ratings 


by 

Randy  H.  Massey,  Captain,  USAF 
C.  J.  Mulllns 
James  A.  Earles 
Personnel  Research  Division 
Brooks  Air  Force  Base,  Texas 

Introduction 

Much  research  done  on  ratings  has  been  concerned  with  efforts  to 
determine  the  best  stimulus  statements  to  use  In  a  rating  situation. 
Unfortunately,  In  much  of  this  research  *'best'*  has  been  defined  in 
terms  of  psychometric  properties  Inherent  In  the  ratings.  Little 
research  has  been  done  employing  external  criteria  for  evaluating  rating 
statements*    This  study  focuses  on  the  relative  merits  of  rating  state- 
ments with  content  selected  to  represent  different  points  on  a  continuum 
from  highly  Job-specific  statements  to  per son- oriented,  trait-like 
statements*    A  context  was  constructed  which  provides  an  opportunity  to 
evaluate  the  usefulness  of  various  sets  of  rating  statements  against 
criteria  external  to  the  ratings,  rather  than  the  more  traditional 
method  of  evaluating  rating  statements  in  terms  of  their  internal 
psychometric  characteristics  e 

The  generally  accepted  viewpoint  is  that  the  more  specific 
observable  behaviors  are  more  accurately  rated  than  general  personality 
descriptive  statements*    This  viewpoint  appears  to  be  based  more  on  the 
selective  appraisal  of  a  narrow  spectrum  of  studies  rather  than  on  an 
appraisal  of  all  studies  conducted  in  the  field  (Kavanagh,  1971).  In 
any  case,  the  difficulties  and  controversial  issues  inherent  in  ratings 
have  been  well  documented  (e.g.,  Barrett,  1966;  Kavanagh,  1971;  Ronan  & 
Prien,  1971;  Schmidt  &  Kaplan,  1971). 

Three  prominent  methodological  procedures  in  developing  rating 
stimulus  statements  or  evaluation  attributes  Include  the  following 
approaches:    Behavioral  Expectation  Scales  (Smith  &  Kendall,  1963); 
multitrait-multimethod  (Campbell  &  Flske,  1959);  and  McCormick^s  (1957) 
Job  analysis  approach. 

In  the  Behavioral  Expectation  Scales  (BES)  approach.  Important 
performance  dimensions  are  identified  and  defined  by  a  group  of  Individ^ 
uals  responsible  for  evaluations.    The  scales  are  anchored  by  actual 
Job  behaviors  t^ch  represent  specific  performance  levels*  The 
mult it rait  -multi  me  t hod  approach  uses  data  from  many  traits  and  raters 
which  are  analyzed  for  convergent  and  discriminant  validity.  The 
optimum  stimulus  statements  should  possess  high  convergent  validity 
correlation  coefficients  and  low  discriminant  validity  correlation 
coefficients  (Campbell  &  Flske,  1959).    McCormick  (1957)  emphasizes  the 
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Importance  of  using  job-oriented  and  worker-oriented  statements  derived 
from  job  analysis  techniques.    Job-or-^ented  statements  describe  the  job 
content,  or  what  is  accomplished  by  the  worker  (repair  water  pump, 
inspect  lubrication  system,  drive  pick-up  truck,  etc.)-    Worker -oriented 
statements  tend  to  characterize  generalized  human  behaviors  or  worker 
characteristics  which  are  usually  descriptive  across  many  different  jobs 
(observe  visual  displays,  judge  condition  or  quality,  manually  pour 
ingredients  into  container,  etc.). 

Perhaps  the  most  popular  scaling  procedure  designed  to  measure  job 
performance  is  the  BES  methodology  developed  by  Smith  and  Kendall  (1963). 
BES  has  had  considerable  intuitive  appeal,  and  there  have  been  many 
proponents  of  the  technique  (e.g.,  Campbell,  Dunnette,  Arvey,  Hellervik, 
1973;  Campbell,  Dunnette,  Lawler,  &  Weick,  1970;  Dunnette,  1966;  Landy, 
Farr,  Saal,  &  Fretag,  1976;  and  Zedeck  &  Blood,  1974).  BES  scales  have 
also  been  developed  for  a  variety  of  occupations  (e.g.,  Arvey  &  Hoyle, 
1974;  Landy,  et  al. ,  1976;  Smith  &  Kendall,  1963).    However,  a  review  of 
studies  in  which  BES  was  compared  to  other  formats  does  not  provide 
support  for  the  effectiveness  of  the  BES  methodology  (e.g.,  Buranaska  & 
Hollmann,  1974;  Dickenson  &  Tice,  1973;  Zedeck  &  Baker,  1972;  Borman  & 
Vallon,  1974). 

Intrinsic  to  the  BES  methodology  is  the  assumption  of  the  superior- 
ity of  behavior-based  attributes  over  trait-oriented  attributes. 
McCormlck's  (1957)  job  analysis  approach  assumes  the  superiority  of 
behavior-based  attributes  as  well  as  task-oriented  attributes.  The 
multitrait-multijnethod  approach  is  the  only  methodology  that  does  not 
implicitly  assume  the  superiority  of  behavior-oriented  attributes  over 
trait -oriented  attributes.    In  fact,  both  types  of  attributes  have  been 
found  to  be  effective  in  performance  evaluation  devices  (Kavanagh,  1971) 
when  employing  the  multitrait-multimethod  approach.    Considering  the 
popularity  of  behavior-oriented  statements,  it  is  not  surprising  that 
the  common  belief  is  that  behavior-based  rating  statements  are  superior 
to  trait-oriented  statements.    Nevertheless,  there  is  no  comparative 
evidence  to  indicate  the  superiority  of  any  of  the  aforementioned 
methodologies. 

A  common  issue  underlying  all  rating  methodological  approaches  is 
the      content  issue*'  defined  by  Kavanagh  (1971),  as  **the  issue  of  the 
relative  representativeness  of  traits  .  .  .  along  a  continuum  ranging 
from  subjective  to  objective,  abstract  to  concrete,  or  personality  to 
performance.*'    He  concluded  that  there  is  no  overwhelming  evidence  to 
indicate  the  superiority  of  behavior-based  over  trait -oriented 
dimensions.    He  further  suggests  that  contradictory  findings  across 
reliability  and  validity  studies  couj     be  partially  attributed  to  a 
failure  to  resolve  or  control  for  the      content  issue.''  Resolution 
of  this  issue  may  give  insight  into  the  effectiveness  of  various 
performance  evaluation  methodologies,  particularly  in  relation  to  time 
and  cost  expended.    Settlement  of  this  issue  can  also  have  significant 
explanatory  value  accounting  for  the  numerous  contradictory  findings 
that  exist  in  performance  appraisal  research. 


Kavanagh,  MacKlnney,  and  Wolllns  (1971)  were  the  first  to  directly 
address  the  content  Issue ,  using  the  mil tlrater-mul time thod  approach, 
by  Investigating  middle  managers  using  performance  ratings  from 
superiors  and  two  subordinates.    They  found  more  convergent  validity 
fCuT  personal  traits  than  performance  traits,  but  no  difference  for 
discriminant  validity.    Although  the  higher  personal  trait  convergent 
validity  was  accompanied  by  a  greater  degree  of  ^^halo,»*  the  overall 
conclusion  was  that  ratings  of  personal  traits  did  as  well  as  the 
ratings  of  performance  traits. 

Since  Kavanagh  (1971),  the  content  issue  has  been  almost  entirely 
Ignored.    Recently  Borman  and  Dunnette  (1975)  attempted  to  resolve  the 
content  issue  by  comparing  behavior-based  statements  with  trait- 
oriented  statements.    Their  conclusions  were,  ^^at  present  little 
empirical  evidence  exists  supporting  the  incremental  validity  of 
performance  ratings  made  using  behavior  scales.**    Unfortunately,  there 
are  methodological  problems  associated  with  their  study.    They  coiq>ared 
three  different  rating  systems  (performance  anchored,  performance  nr  i- 
anchored,  and  trait -oriented  statements  obtained  from  the  Naval  Officer 
Fitness  Report),  rather  than  just  comparing  three  rating  formats.  In 
sum,  the  study  did  not  directly  focus  on  the  content  issue  of  rating 
criteria,  but  rather  on  the  effectiveness  of  three  different  rating 
systems.    Among  other  experimental  difficulties,  they  conq)ared  different 
numbers  of  rating  statements  between  treatments  and  Included  trait-like 
statements  (Integrity,  responsibility,  and  dedication)  within  the 
performance  treatment  category. 

It  seems  clear,  then,  that  the  issue  of  the  preferred  content  for 
rating  statements  has  In  no  way  been  resolved  by  previous  research. 
This  study  is  one  in  a  series  of  studies  using  criteria  external  to 
the  ratings  to  attempt  such  a  resolution.    It  is  anticipated  that  this 
approach  will  be  more  effective  in  resolving  the  content  issue  than  were 
past  studies  that  employed  internal  characteristics  of  the  rating 
Instrument  as  criteria  for  judging  the  excellence  of  rating  statements. 


Method 

Sample 

One  hundred  and  twenty  students  assigned  to  the  ATC  NCO  Academy 
at  Lackland  AFB  Annex  completed  the  rating  tasks.    The  study  Included 
nine  separate  seminar  groups,  each  consisting  of  13  or  14  NCOs  (E6s  to 
E7s)  irtiose  length  of  military  service  was  10  to  17  years. 

Rating  Scalea 

The  treatment  conditions  in  this  study  varied  across  three  differ- 
ent types  of  rating  statements  (task-oriented,  worker-oriented,  and 
trait -oriented) .    Ten  rating  statements  representing  each  of  the  three 
different  kinds  of  rating  content  were  Included  in  the  study.  These 
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were  determined  by  consultation  with  Instructors,  administrative 
officials,  and  students.    Previously  conducted  studies  were  also 
reviewed  to  identify  factors.    Each  of  the  10  rating  attributes  was 
rated  on  a  5*point  scale  as  follows: 


Attribute 

Trait  oriented  attributes  also  Included  a  brief  descriptive  definition. 
See  Appendix  A  fer  a  complete  list  and  description  of  the  rating 
statements. 

Rating  Tasks 

The  research  was  conducted  in  two  phases.    In  Phase  I,  each  student 
rated  all  members  in  his  seminar  group  on  one  and  only  one  of  the  three 
different  types  of  statemfeints--'task-oriented ,  worker-oriented,  and 
trait -oriented.    This  phase  resulted  in  the  generation  of  individual 
profiles  based  on  the  group's  evaluation  of  each  member  on  each  of  the 
10  selected  rating  attributes. 

In  Phase  II,  about  2  weeks  later,  the  experimenter  handed  out  the 
profiles  ;to  the  seminar  group  without  an  identifying  name  on  the 
profiles."    Each  subject  was  required  to  perform  three  tasks:    first,  he 
had  to  rank-order  the  profiles  according  to  predicted  seminar  class 
rank;  second,  he  had  to  identify  to  whom  each  profile  belonged;  and 
third,  he  had  to  predict  the  final  school  seminar  class  rank  of  his 
seminar  peers  without  any  regard  to  profile  considerations.  Subjects 
appeared  unaware  of  the  nature  of  the  study  until  Phase  II  research 
when  they  were  asked  to  identify  each  of  the  profiles. 

Research  Approach  and  Rationale 

Many  studies  into  the  relative  efficiency  of  sets  of  rating  state- 
ments have  apparently  started  with  a  basic  set  of  assumptions:  (1) 
Raters  are  subject  to  leniency  error  resulting  in  elevated  means  and  to 
halo  error  revealed  by  small  standard  deviations  among  the  ratings 
assigned.    Since  these  two  forms  of  rating  error  are  revealed  by  the 
Indicated  statistics,  a  study  of  means  and  standard  deviations  foxms  a 
basis  for  comparison  among  sets  of  rating  statements '^hich  may  be^BtSed 
to  distinguish  among  sets  as  to  their  goodness;   (2)  If  rating  sta^ments 
are  meaningful,  and  if  raters  are  accurate  in  their  perceptions  of^ 
ratees,  then  inter- judge  agreement,  in  the  form  of  correlations  among 
sets  of  ratings  issuing  from  different  judges,  will  be  an  expression  of 
the  goodness  of  a  set  of  ratings;  (3)  The  most  useful  way  to  compare 
sets  of  rating  statements  with  each  other  lies  in  the  comparisons  which 
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can  be  made  among  the  sumnary  statistics  produced  by  the  ratings.  If 
one  accepts  these  assumptions »  then  it  fc/llovs  that  the  best  way  to 
compare  sets  of  rating  descriptions  Is  as  It  has  frequently  been  done— 
the  best  set  Is  that  set  which  produces  lower  means »  larger  standard 
deviations 9  and  larger  Inter- Judge  correlation  coefficients. 

However 9  the  foregoing  assumptions  are  subject  to  challenge. 
Taking  them  In  order:    (1)  The  evidence  seems  clear  that  leniency  and 
halo  errors  do  occur.    It  Is  less  clear  how  liq>ortant  these  two  errors 
are  In  a  family  of  other  possible  errors  (e.g.,  racial  bias,  low  rater 
motivation »  low  observability  of  the  ratee,  and  others).    It  Is  also 
clear  that  there  Is  not  a  direct  relationship  between  leniency  error  and 
larger  means  or  letween  halo  error  and  smaller  standard  deviations*  A 
person  vho  Is  good  on  one  dimension  is  more  likely  also  to  be  good  on 
whatever  other  dimensions  are  being  considered.    This  is  true  whether 
the  ^'goodness''  metric  is  derived  from  ratings,  from  tests »  or  from  any 
other  reasonable  source.    Therefore »  some  portion  of  **halo  error**  may 
reflect  true  conditions »  and  be  no  error  at  all.    (2)  Inter- judge  agree- 
ment may  sometimes  be  a  sufficient  basis  for  comparing  sets  of  rating 
statements »  but  it  is  not  unusual  for  groups  of  judges  to  agree  on  a 
decision  which  additional  facts  show  to  be  in  error.    If  one  may  postu- 
late individual  differences  among  raters  in  respect  to  their  ability  to 
perceive  ratees  accurately ,  which  seems  plausible »  then  one  must  agree 
that  some  raters  will  provide  better  ratings.    If  some  raters  are  better 
than  others »  it  seems  naive  to  expect  that  their  ratings  of  a  given 
characteristic  will  fall  eternally  at  the  mean  of  ratings  given  on  that 
characteristic.    (3)  In  this  study »  an  approach  is  taken  which  provides 
a  better  basis  for  making  comparisons  across  rating  sets  than  does  the 
traditional  psychometric  comparison.    The  approach  is  constructed 
around  the  conc^t  of  *'hits;**  that  ±a»  the  number  of  times  a  rater  can 
correctly  identify  anonj^^ous  profiles  of  his  peers »  constructed  around 
various  sets  of  descriptor  statements. 

If  a  rating  statement  is  useful  in  describing  a  person »  and  if  a 
group  of  raters  can  agree  to  some  extent  on  the  elevation  of  this 
characteristic  in  a  ratee»  then  a  profile  of  this  ratee  produced  from  a 
set  of  such  statements  should  be  identifiable  as  a  rating  ^^picture** 
of  that  individual.    If  a  group  of  raters  can  recognise  the  individuals 
i^om  their  profiles  describe »  then  it  seems  more  likely  that  the  set  of 
profiled  characteristics  can  be  useful  in  evaluating  or  predicting  the 
performance  of  those  individuals.    The  number  of  ^*hitB^*  (correctly 
labeled  profiles)  should  be  useful  in  comparing  one  set  of  rating 
descriptions  with  another. 

One  analysis  was  made  using  hits  as  the  dependent  variable.  The 
nuoiber  of  hits,  however ,  at  least  in  prior  research  (Curton,  Satllff» 
&  Mullins»  1977)»  has  proved  so  small  that  something  more  sensitive  was 
needed.  A  rater  could  conceivably  mlsidentify  the  first  profile  con- 
sidered; and  that  misidentificatlon  could  cause  him  to  miss  the  rest» 
even  if^cmly  by  a  small  margin— or  he  could  be  so  insensitive  to 
personal  differences  th&t  he  makes  guess  errors  in  all  the 
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identifications.    The  search  for  a  sensitive  measure  of  profile  identifi- 
cation led  to  the  use  of  the  rank-'order  correlation  as  a  possibly  more 
effective  measure  of  identification  of  peers  than  the  simple  count  of 
correct  identifications. 

If  a  rater  trying  to  identify  anonymous  profiles  of  his  peers  is 
confronted  with  15  profiles,  three  of  which  have  been  rated  very  high  on 
a  particular  characteristic,  and  if  he  believes  correctly  that  peers  B, 
H,  and  J  are  the  three  in  his  peer  group  highest  on  this  characteristic, 
he  may  not  know  which  of  the  three  is  peer  B.    He  might  specifically 
misidentify  all  three  profiles,  although  he  has  been  correct  in  believing 
that  these  three  profiles,  as  a  set,  represent  peers  B,  H,  and  J. 
Although  he  has  come  close,  his  number  of  exact  identifications,  or  hits, 
among  these  Lhree  profiles  ;.'ould  be  zero,  ro  better  than  it  would  be  for 
some  less  astute  rater  who  believed  B,  H,  and  J  were  the  lowest  three  in 
the  peer  group  on  that  characteristic.    In  short,  the  ••hits"  measure 
contains  no  provision  for  crediting  near  misses,  but  the  correlation 
between  the  ranking  of  unidentified  profiles  and  the  ranking  of  his 
named  peers  on  the  success  dimensions  should  provide  a  continuum  which 
the  raw  **hits*'  metric  does  not  possess.    A  rank-order  correlation 
between  these  two  ranks  should  provide  a  sensitive  measure  of  recognition 
far  more  powerful  than  the  simple  count  of  matched  profiles. 

Data  Analysis 

In  order  to  apply  the  metric  described  in  the  preceding  paragraph, 
three  rankings  were  collected.    First,  an  official  ranking  (OR)  of  the 
students  performed  by  the  school  was  available.    Second,  a  ranking  of 
the  anonymous  profiles  (UP)  was  collected.    Finally,  a  ranking  of 
seminar  members  by  their  peers  (PR)  was  collected.    This  ranking  was 
made  using  only  a  list  of  peer  names,  not  profiles,  and  was  made 
according  to  predictions  of  success  in  training. 

The  UP  and  PR  rankinc^s  were  group  average  ranks  derived  by  summing 
all  of  the  assigned  ranka  for  each  person  in  his  seminar  group,  then 
converting  that  total  siim  of  ranks  back  to  a  rank  order  ranging  from  1 
to  13  or  14  depending  on  the  semlri^3X"*s  group  size.    These  average  ranks, 
UP  and  PR,  represented  a  group  concensas  on  the  perception  of  each 
seminar  member  by  the  group.    The  Official  Class  Rank  (OR)  was 
determined  by  class  standing  on  four  exams  (312  points),  drill  evalua- 
tion (25  points) ,  student  evaluation  (25  points)  and  communication 
skills  (38  points). 

Rank-order  correlations  for  each  rater  were  computed  for  the 
following  purposes: 

(1)    Correlation  between  uniderjtified  profile  ranking  and  named 
peer  rankings  (UP-PR)— one  correlation  coefficient  was  computed  for 
each  rater  and  was  viewed  as  a  more  sensitive  measure  of  hits  than  the 
number  of  exact  identifications  of  unlabeled  profiles.    This  produced 
a  new  variable,  the  logic  of  which  was  explained  above. 
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(2)  Correlations  between  unidentified  profile  rankings  and 
official  class  rank  (DP-'vOR)--One  correlation  coefficient  for  each  rater. 
This  variable  indicates  hew  well  the  rater  can  evalvaate  the  operational 
criterion  (OR)  in  terms  of  the  statements  available.    Differences  in 
effectiveness  among  the  statement  sets  should  be  revealed  in  differences 
between  the  sizes  of  the  average  correlation  coefficients.  Average 
correlation  coefficients  across  groups  could  have  been  computed  by  summing 
the  numerators  in  the  rho  formula  (6Zd^)  and  divided  by  the  sum  of  the 
demoninators  (N(N^  -  1)).    The  scjuared  deviations  (d^)  were  used  in  the 
analyses  of  variance  since  in  this  instance  it  provided  a  simpler  and 
more  accurate  measurement  variable  in  examining  rank  order  effect  than 
the  correlation  coefficients  themselves. 

(3)  Correlation  between  named  peer  rankings  and  official  class  rank 
(PR-OR)--One  for  each  rater.    The  average  of  this  correlation  coefficient 
would  normally  indicate  the  efficiency  of  peer  ratings  in  predicting  a 
criterion.    In  this  case,  however »  there  was  considerable  evidence  that 
most  of  the  subjects  were  well  aware  through  intra-group  discussion  of 
how  their  peers  had  done  on  previous  tests  and  were  consequently  aware 

of  how  they  stood  on  the  overall  class  evaluation.    In  short »  they  were 
ranking  on  direct  information  about  their  peers  rather  than  Judgment 
based  on  indirect  knowledge. 

The  primary  analysis  included  testing  to  see  if  significant  differ- 
ences existed  in  terms  of  hits  and  the  other  dependent  variables  among 
the  three  treatment  conditions-    Since  each  seminar  group  was  randomly 
assigned  to  one  of  the  three  treatment  conditions,  the  experimental 
design  resulted  in  the  nesting  of  three  seminar  groups  under  each  treat- 
ment condition.    The  hierarchical  design  (Nested  Factors)  is  usually 
used  to  test  the  effects  among  a  number  of  treatments  in  certain  types 
of  experimental  situations  (Winer,  1962).    Typical  examples  include 
investigating  drug  effects  among  a  nimiber  of  hospitals,  studying  teach- 
ing methods  among  a  number  of  schools,  or  studying  training  methods 
among  different  individuals- 

The  hierarchical  ANOVA  is  an  efficient  method  of  studying  such 
experimental  situations  because  it  avoids  multiple  t-tests  or  non- 
orthogonal  comparisons  (Hays,  1963).    The  two-way  hierarchical  ANOVA  in 
this  experiment  is  also  a  more  powerful  statistical  test  than  a  one-way 
ANOVA  that  only  tests  for  treatment  effects  Ignoring  any  group  effects. 
In  this  design,  the  nested  factors  are  controlled  by  statistical 
procedures.    In  many  experimental  situations,  it  is  dangerous  to  assume 
that  certain  nested  factors  have  no  significant  influence  on  treatment 
effects. 

Two  sources  of  variation  were  observed  in  the  experimental  data. 
The  treatment  effect  was  of  primary  interest,  whereas  the  seminar  group 
affiliation  was  of  secondary  interest.  The  null  hypothesis,  no  differ- 
ences between  treatment  means,  was  tested  for  both  investigated  sources 
of  variation.    The  analysis  of  both  sources  of  variation  was 
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accomplished  by  performing  a  two-way  hierarchical  ANOVA  for  experiments 
with  unequal  cell  sizes  using  the  least-squares  procedural  method 
described  by  Tim  and  Carlson  (1975). 

**Hlts**  and  the  sum  of  the  squared  differences  between  UP  and  PR 
rankings,  UP  and  OR  rankings,  and  PR  and  OR  rankings  were  the  dependent 
variables  used  in  the  ANOVA  analysis  to  determine  if  significant  differ- 
ences existed  among  treatment  conditions.    The  squared  difference 
between  rank  orderings  was  used  rather  than  the  rank-order  correlations 
since  the  squared  difference  provided  a  simpler  and  more  accurate 
iheasurement  variable  in  examining  rank  order  similarity. 


Results  and  Discussion 


The  hierarchical  ANOVA  summary  for  **hlts^«**  or  correct  identifica- 
tion of  profiles  is  shown  in  Table  1.    As  expect.-^.d,  the  **hlt"  measure- 
ment variable  showed  no  significant  differences  among  treatments.  In 
essence,  the  rating     picture**  for  each  individual  produced  by  the 
three  different  sets  of  rating  statements  were  equal  in  their  descrip- 
tive power.    However,  seminar  group  effects  within  a  treatment  were 
significant  at  the  .01  level  (Table  1).    Table  2  shows  the  summary 
results  of  hits  for  seminar  groups  within  treatments. 


Table  1.    Analysis  of  Variance  by  Number  of  '*Hlts** 
(Correct  Profile  Identifications)  by 
Treatment  and  Seminar  Group 


Source 

Sum  of  Squares 

D.F. 

Mean  Sq^uares 

F 

Treatment 

6.215 

2 

3.107 

.349 

Seminar  Groups 
Within  Treatments 

53.421 

6 

3.903 

3.247* 

Error  (Within 
Groups) 

304.379 

ill 

2.742 

*Slgnlf leant  at  the  .01  level. 
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Table  2.    Nujnber  of  Profile  Identifications  (Hits)  by 
Treatment  and  by  Seminar  Group 


Treatinent  1  Treatment  2  Treatment  3 

(Seminar  Group)      (Seminar  Group)      (Seminar  Group) 

F       I       A  C       E       H  B       D  G 


Group  Results 
Total  N 
Total  Hits 
Mean  Hits 
SD  Hits 


13     14  13 
24     47  45 
1.86  3.36  3.46 
1.63  1.82  2.37 


14     13  13 
32      26  43 
2.29  2.00  3.69 
1.90  1.68  1.55 


13     13  14 
23     29  42 
1.77  2.23  3.00 
1.30  1.30  .96 


Treataien!:  Results 
Total  N 
Total  Hits 
Mean  Hits 
SD  Hits 


40 
116 
2.90 
2.05 


40 
106 
2.65 
1.83 


40 
94 
2.35 
1.27 


T-Ratlos 


Treatments  1  vs.  2  Comparison 
Treatmente  1  vs.  3  Ck>2q>arlson 
Treatments  2  vs.  3  Con^arlson 


t  -  .574^^ 
t  -  1.44^^ 
t  -  .85^^ 


"  not  significant. 


The  average  rank-order  correlations  between  the  pairs  of  rankings 
appear  In  Table  3.    Using  Ferguson*^  (1966)  table  of  significance  for 
Spearman  Rhos,  25  of  the  possible  27  rhos  were  significant  at  the  .05 
level.    Furthermore,  most  of  the  nine  correlations  possible  in  each 
treatment  group  were  significant  at  the  .01  level  (21  in  all),  and  only 
one  correlation  in  treatment  II  and  III  was  not  significant.  All 
correlations  demonstrated  a  similar  pattern  of  significance  in  each  of 
the  three  treatment  conditions.    The  three  rank  order  conqparlsons  showed 
a  high  degree  of  agreement.    This  data  analyses  suggested  that  no  one 
type  of  rating  statement  w.'.s  superior  for  use  in  performance  appraisal 
instruments.    The  purpose  of  these  rank  order  comparisons  was  to  see 
whether  the  pattern  of  significance  under  each  treatment  was  generally 
similar  or  different.    However,  the  most  definitive  test  for  determin- 
ing differences  between  treatments  was  the  hierarchical  ANOVA  analysis. 

Tables  4  to  6  show  the  hierarchical  ANOVA  summary  for  comparison 
of  the  rating  statement  treatment  conditions  with  respect  to  the  squared 
difference  between  the  following  rank  order  comparisons?    UP-PR,  UP-OR, 
and  PR-OR.    The  ANOVA  results  showed  no  significant  difference  between 
treatment  conditions  as  reflected  by  the  squared  differences  between 
the  UP-FR  rankings  (viewed  as  a  more  sensitive  measure  of  identification 
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of  unlabeled  profiles),  the  UP-PR  rankings  (Indicating  how  well  the 
rater  can  evaluate  the  operational  criterion  iu  terms  of  given  stimulus 
statements),  and  the  PR-OR  rankings  (normally  indicating  the  efficiency 
of  peer  ratings  in  predicting  a  criterion). 


Table  3.    Rank  Order  Correlations  Among  Unidentified  Profile 
Rankings,  Peer  Rankings,  and  Official  Rank  by  Treatment 
and  by  Seminar  Group 


 Treatments  

I  (Worker)  II  (Task)  III  (Trait) 


Rank  Order 
Comparisons 

Seminar  Groups 
F           I  A 

Seminar  Groups 
C           E  H 

Seminar  Groups 
B           D  G 

UP  an  .  'OR 

.58* 

.86** 

.87** 

.85** 

.86** 

.90** 

.79** 

.90**  .71** 

UP  and  OR 

.52* 

.71** 

.82** 

.43 

,65* 

.85** 

.37 

.72**  .70** 

PR  and  OR 

.87** 

.93** 

.97** 

.57* 

.79** 

.94** 

.74** 

.79**  .97** 

Total  N 

13 

14 

13 

14 

13 

13 

13 

13  14 

Critical  values  of  £,  the  Spearman  rank  correlation,  were  obtained 
from  Ferguson  (1959),  Table  G,  p,  414. 
^Significant  at  .05  level. 
**Significant  at  .01  level. 


Table  4.    Analysis         ariance  of  Squared  Deviations  between 
Unidentified  Prc-Iie  bankings  and  Peer  Rankings  by 
Treatment  and  by  Seminar  Group 


Source 

Sum  of  Squares 

D.F. 

Mean  Squares 

F 

Treatment 

9396.114 

2 

4698.057 

.117 

Seminar  Groups 
Within  Treatments 

241470.876 

6 

40245.146 

4.722* 

Error  (Within 
Group) 

945985.099 

111 

8522.388 

*Signif leant  at  .01  1  vci. 
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Table  5.    Analysis  of  Variance  of  Squared  Deviations  between 
Unidentified  Profile  Rankings  and  Official  Rankings 
by  Treatment  and  by  Seminar  Group 


Source  Sum  of  Squares  D«F«  Mean  Squares  F 


Treatment 

12127.327 

2 

6063.663 

.0922 

Seminar  Groups 
Within  Treatments 

394330.700 

6 

65721.783 

13.031* 

Error  (Within 
Groups) 

558976.730 

111 

5035.826 

*Slgnlflcant  at 

.01  level. 

Table  6.    Analysis  of  Variance  of  Squared  Deviations  between 
Peer  Rankings  and  Official  Rankings  by  Treatment 
and  by  Seminar  Group 

Source 

Sum  of  Squares 

D.F. 

Mean  Squares 

F 

Treatments 

119263.060 

2 

59631.530 

.769 

Seminar  Groups 
Within  Treatments 

465196.015 

6 

77532.668 

16.160* 

Error  (Within 
Groups) 

53553.566 

111 

4797.780 

*Slgnlf leant  at  .01  level. 


The  PR-OR  rank  order  coefficient,  however,  cannot  be  considered  an 
unbiased  Indicator  since  there  was  considerable  evidence  that  most 
subjects  were  ranking  on  Information  based  on  knowledge  of  test 
performance  acquired  through  Intra-group  association,  rather  than 
judgment  based  solely  on  observation  of  peer  activities  and  traits. 

Although  no  significant  rank  order  differences  were  found  between 
treatment  conditions  as  reflected  by  the  squared  difference  ct  the 
various  pairs  of  rankings,  the  differences  between  seminar  groups  within 
treatments  on  all  three  ANOVA  analyses  were  significant  at  the  .01  level 
(Tables  4,  5,  and  6).    This  was  an  unexpected  finding  because  each 
seminar  group  was  randomly  assigned  to  one  of  the  three  treatment 
conditions.    The  results  demonstrated  that  no  one  type  of  content  rating 
statement  was  superior  to  any  other  In  determining  rank  order  differences. 


The  data  analyses  showed  that  the  statements  investigated  here 
yielded  no  significant  advantages  for  one  set  of  statements  over  another. 
It  makes  no  difference  whether  the  rating  statements  are  task-oriented, 
worker-oriented,  or  trait -oriented.    This  study  provides  additional 
evidence  that  the  doubts  of  Bell,  Hoff,  and  Hoyt  (1963),  Borman  and 
Dunnette  (1975),  and  Kavanagh,  MacKinney,  and  Wollins  (1971)  about  the 
superiority  of  job-oriented  dimensions  over  trait-oriented  dimensions 
were  well  founded.    As  Kavanagh  (1971)  concluded  from  his  comprehensive 
literature  review  of  performance  appraisal  studies,  there  is  no  reason 
to  assume  the  superiority  of  job-oriented  statements  over  trait- 
oriented  statements.    The  selection  of  rating  statements  for  inclusion 
in  performance  appraisal  devices  should  primarily  be  determined  by  cost 
considerations.    Cost  considerations  tend  to  favor  trait-oriented 
statements  in  most  situations,  since  job  analysis,  which  is  required  to 
obtain  task-oriented  and  worker-oriented  statements,  is  costly  and  time 
consuming.    Trait-oriented  statements  are  also  much  more  generalizable 
across  different  occupations  than  either  task-oriented  or  worker- 
oriented  statements. 

Unlike  many  prior  studies,  this  study  does  not  conclude  with  a 
condemnation  of  judgraental  rating  statements.  This  study  suggests  that 
peer  group  person-oriented  statements  are  as  effective  as  job  descrip- 
tive statements  when  the  standard  is  an  external  criterion  such  as 
ability  to  recognize  peers  from  unidentified  profiles  or  ability  to 
predict  their  official  class  rank. 

An  unexpected  finding  was  the  significant  effect  associated  with 
seminar  groups  on  all  performed  ANOVA  analyses,  particularly  since  all 
seminar  groups  were  randomly  assigned  to  each  treatment  condition. 
The  importance  of  recognizing  and  controlling  for  group  effects  in  such 
performance  evaluation  studies  is  evident.    Investigated  treatment 
variables  might  easily  become  contaminated  by  group  effects  leading  to 
inaccurate  results  and  conclusions.    The  reasons  for  these  significant 
group  effects  are  unknown,  although  such  intra-group  variables  as 
morale,  leadership,  and  attitude  are  possible  causal  influences. 

It  may  be  that  performance  appraisal  research  emphasis  has  not  been 
placed  on  the  most  important  variables.    Perhaps  there  are  environmental 
influences  that  affect  performance  ratings  more  than  variables  attribut- 
able to  the  appraisal  device.    Perhaps  such  issues  as  content,  format, 
scale,  etc.,  are  relatively  unimportant  as  compared  to  these  other 
variables.    A  need  exists  to  broaden  the  research  focus  in  performance 
appraisal  studies  focusing  on  criteria  independent  and  external  to  the 
performance  appraisal  device. 


Summary  and  Conclusions 


Three  different  kinds  of  rating  stimulus  statements  differing  along 
a  dimension  of  trait-oriented  to  task-oriented  descriptions,  were 
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compared  In  a  context  vhlch  permitted  the  conq>arlsons  to  be  made  In 
terms  of  criteria  external  to  the  ratings.    No  evidence  of  superiority 
was  found  for  any  of  the  three  sets  although  many  significant  correla- 
tions with  various  external  criteria  were  obtained  In  all  three  experi- 
mental conditions. 

Significant  differences  were  also  fotoid  among  the  three  rating 
sub-groups  comprising  each  of  the  three  treatment  groups  although  these 
rating  sub-groups  were  assigned  randomly  to  the  three  treatment  groups. 
The  Importance  of  controlling  for  group  effects  In  peer  group  studies 
was  noted. 
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APPENDIX 


WORKER-ORIENTED  RATING  DIMENSIONS 


Well 

Bel  ^  w  Above     Above     Out  - 

Average  Average  Average  Average  Average 

1.  Military  appearance   (A)         (B)         (C)         (D)  (E) 

2.  Participates  In  class 

activities   (A)  (B)  (C)         (D)  (E) 

3.  Communicates  clearly  by 

oral  and  written  methods...      (A)  (B)  (C)         (D)  (E) 

4.  Amount  of  assistance  to 

peers  in  work  assignments..      (A)  (B)         (C)         (D)  (E) 

3.    Completes  work  in  a  timely 

manner...   (A)  (B)         (C)         (D)  (E) 

6.  Follows  provided 

Instructions   (A)  (B)  (C)         (D)  (E) 

7.  Takes  accurate  notes   (A)  (B)  (C)         (D)  (E) 

8.  Competence  In  analyzing 

work  assignments   (A)  (B)  (C)         (D)  (E) 

9.  Awsreness  of  safety 

precautions   (A)  (B)  (C)         (D)  (E) 

10.     Studies  well  on  his  own...        (A)  (B)  (C)         (D)  (E) 
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TASK-ORIENTED  RATING  DIMENSIONS 


Below 

Average  Average 
Effective-Effective- 
ness ness 

1.  Knows  UCMJ  pro- 
grammed text .......       (A)  (B) 

2.  Contributes  examples 
in  seminar  on  Disci- 
pline and  Unity  of 

Command   (A)  (B) 

3.  Promotes  and 
organizes  Community 

Project   (A)  (B) 

4.  Analyzes  courts- 
martial  case  study.       (A)  (B) 

5.  Partic:.pates  in 
Foreign  Policy  role 

playing   (A)  (B) 

6.  Understands  reasons 
for  nonalignment  of 
uncommitted  nations      (A)  (B) 

7.  Knows  history  of 

AF  uniform   (A)  (B) 

8.  Applies  the  six-step 
approach  to  problem 

solving   (A)  (B) 

9.  Knows  how  to  plan  a 

conference   (A)  (B) 

10.    Researches  topic  for 

Persuasive  Speech. .      (A)  (B) 


Above 
Average 
Effective- 
ness 


(C) 


(C) 

(C) 
(C) 

(C) 

(C) 
(C) 

(C) 
(C) 
(C) 


Well 
Above 
Average 
■Effective- 
ness 


(D) 


(D) 

(D) 
(D) 

(D) 

(D) 
(D) 

(D) 
(D) 
(D) 


Out- 

stsnding 
Effective- 
ness 


(E) 


(E) 

(E) 
(E) 

(E) 

(E) 
(E) 

(E) 
(E) 
(E) 
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APPENDIX 


TRAIT-ORIENTED  RATING  DIMENSIONS 


Well 

Below  Above     Above  Out- 

Average  Average  Average  Average  standing 

1.  Honesty  -  straightforward 
and  truthful  In  dealing 

with  others   (A)         (B)         (C)  (D)  (E) 

2.  /mbltlon  -  works  hard, 

accepts  challenges   (A)  (B)         (C)         (D)  (E) 

3.  Dependability  -  does 
assigned  tasks  con- 
scientiously without 

close  supervision   (A)         (B)         (C)         (D)  (E) 

4.  Punctuality  -  prompt 

In  keeping  engagements...  (A)         (B)         (C)  (D)  (E) 

5.  Quality  of  work  -  per- 
forms work  accurately 

and  effectively   (A)         (B)         (C)  (D)  (E) 

6.  Quantity  of  work  - 
produces  a  large  amount 
of  work  that  meets 

requirement  standards   (A)         (B)         (C)  (D)  (E) 

7.  Initiative  -  originates 
and  achieves  goals  o^i 

his  own   (A)         (B)         (C)  (D)  (E) 

8.  Adaptability  -  changes 
attitude  and  behavior 
to  meet  the  demands  of 

the  situation..   (A)         (R)         (C)  (D)  (E) 

9.  Originality  -  creative, 
thinks  of  new  solutions 

to  old  problems   (A)         (B)         (Ci  (D)  (E) 

10.    Agreeabl'sness  -  gets 
along  well  with  fallow 

workers,  well  liked  >  (A)         (B)         (C)  (D)  (E) 
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DIFFERENTIAL  RESPONSES  OK  ALTERNATELY  ANCHORED  JOB  RATING  SCALES^ 


Jimmy  L.  Mitchell,  Lt  Col,  USAF 


USAF  OCCUPATIONAL  MEASUREMENT  CENTER 
OCCUPATIONAL  SURVEY  BRANCH 
LACKLAND  AFB,  TEXAS  78236 


A  paper  presented  at  the  Military  Testing  Association  Convention 
30  October  -  3  November  1978 


The  views  expressed  in  this  paper  represent  those  of  the  authors  and 
do  not  necessarily  reflect  the  views  of  the  United  States  Air  Force  or 
the  Department  of  Defense. 
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DIFFERENTIAL  RESPONSES  ON  ALTERNATELY  ANCHOLED  JOS  RATING  SCALES 


Jinnny  L.  Mitchell,  Lt  Col,  USAF 


USAF  Occupational  Measurement  Center 
Occupational  Survey  Branch 
Lackland  AFB,  TX  78236 


A  variety  of  rating  scales  have  been  used  wir^  ^ob  and  occupational 
data  through  the  years  but  very  seldom  is  a  ratiou:      given  for  the  us6 
of  a  particular  scale.    Likewise,  there  have  been  a  number  of  ways  in 
which  scales  have  been  anchored  but  the  reasons  behind  the  choice  of  a 
Spoint  scale  over  a  7-  or  9-point  scale  have  not  typically  been 
reported. 

Viteles*  job  psychograph  was  developed  in  1934;  it  consisted  of  a 
standard  set  of  psychological  traits,  each  of  which  was  to  be  rated  by  a 
job  analyst  as  to  its  "importance"  for  the  job  being  studied  (Viteles; 
as  cited  in  Blum  &  Naylor  1968;  506).  The  considerable  influence  of 
this  pioneering  work  survives  today  in  the  form  of  trait  ratings,  such 
as  are  used  in  the  Department  of  Labor  job  analysis  system  (Department 
of  Labor  1972)  and  in  the  wide-spread  use  of  5-point  importance  scales 
(cf.  Baehr  1967;  McCormick,  Jeanneret,  &  Mecham  1972). 

In  some  of  the  more  recently  developed  job  analysis  systems,  longer 
scales  have  been  used.    Hemphill  (1959)  in  his  study  of  executive 
positions,  used  a  7-point  Part-of-the-Position  scale  with  three  verbal 
anchors.    The  Air  Force  occupational  analysis  program  used  first  a 
7-point  scale  and  later  a  9~point  scale  measuring  relative  time  spent, 
with  verbal  anchors  for  each  scale  point  (Morsh  1964,  Driskill  1975). 
Other  job  analysis  systems  have  used  scales  which  vary  in  length  ^rom 
item  to  item  (Scott  1963;  Fine  and  Wiley  1971). 

The  literature  on  scaling  provides  few  clues  as  to  the  optimum 
number  of  levels  for  job  rating  scales.    However,  Matell  and  Jacoby 
(1972)  determined  experimentally  that  if  the  number  of  scale  levels 
exceeds  5,  only  about  60  percent  of  the  scale  will  be  used.  They 
concluded  that  scales  of  no  more  than  five  to  seven  levels  should  be 
adequate  for  most  measurement  purposes. 

Christal  and  Madden  (1961)  have  raised  the  issue  of  being  able  to 
detect  those  jobs  which  would  be  "off  scale"  when  compared  to  other 
jobs.    This  is  an  issue  of  particular  interest  when  a  large  number  of 
jobs  are  to  be  considered  and  one  objective  of  measurement  is  to  be  able 
to  distinguish  between  jobs  which  are  substantially  different.    In  such 
cases,  a  larger  number  of  scale  levels  are  needed  to  insure  that  the 
extreme  jobs  can  be  appropriately  rated.    Thus,  in  the  Air  Force 
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occupational  analysis  program,  a  9 -level  scale  is  typically  used.  This 
gives  the  maximum  possible  discrimination  in  a  single  digit  scale  and 
provides  the  opportimity  to  detect  extreme  jobs  in  most  Air  Force 
occupational  areas. 

A  potentially  more  serious  problem  lies  in  the  selection  of  verbal 
anchors  for  the  scale  points  of  job  rating  scales.    Christal  and  Madden 
(1961)  noted  that  it  has  never  been  determined  whether  every  scale  point 
should  have  a  verbal  anchor.    While  most  job  rating  scales  which  have 
been  used  through  the  years  have  provided  such  anchors,  Hemphill  (1959) 
used  a  7-point  scale  with  only  three  verbal  anchors.    Cragun  and 
McCormick  (1967)  used  this  scale  with  Air  Force  officers  in  a  study  of 
the  reliability  of  job  ratings  and  their  results  suggest  that  it  had 
considerable  reliability  and  was  to  some  degree  preferred  by  incumbents 
in  managerial  positions  to  characterize  their  jobs.    Tornow  and  Pinto 
U'^76)  used  the  same  scale  but  they  compressed  it  to  a  five  point  scale; 
they  provided  no  rationale  for  their  modification  of  the  Hemphill  scale 
nor  any  estimate  of  the  effect  of  this  modification  on  their  final  data. 

I  have  not  been  able  to  find  any  definitive  answer  to  the  question 
of  the  anchoring  of  scale  points  in  the  job  analysis  literature. 
However,  in  the  course  of  gathering  and  analyzing  data  for  the 
development  of  a  structured  job  analysis  instrument,  I  chanced  on  some 
interesting  results  which  bear  on  this  issue. 

The  instrument  being  developed  was  the  Professional  and  Managerial 
Position  Questionnaire  (PMPQ),  an  experimental  structured  job  analysis 
questionnaire  for  the  study  of  higher  level  jobs  (Mitchell  and  McCormick 
1976) .    This  93-itum  questionnaire  was  developed  in  the  tradition  of 
McCormick' s  Position  Anal/sis  Questionnaire  (PAQ)  but  was  aimed 
specifically  at  executive  and  management  types       positions  since 
earlier  research  with  the  PAQ  had  indicated  th       •  separate  instrument 
for  higher-level  positions  might  be  appropria      ,aarris  &  McCormick 
1973). 

In  this  new  instrument,  9-point  Part-of-the-Job  and  Complexity 
scales  were  used  with  verbal  anchors  for  every  other  scale  point  (1,  3, 
5,  7,  and  9).    Additionally,  the  Complexity  ratings  were  further 
anchored  with  behavioral  examples;  these  behavioral  examples  were  scaled 
by  obtaining  independe^xL  ratings  of  a  set  of  examples  from  panels  of 
professional  and  academic  industrial  psychologists  (Mitchell  1978). 
Also  included  in  the  instrument  were  items  dealing  with  the  personal 
requirements  for  the  positions,  to  determine  such  things  as  educational 
levels  required,  prior  experience,  training,  etc.,  and  a  section  for 
other  information,  such  as  the  nUiUber  of  people  supervised,  etc.  For 
these  items,  there  were  numbers,  categories,  or  constructs  which  were 
used  to  anchor  every  point  of  the  scale,  such  as  years  of  education, 
numbers  of  employees,  etc.    Thus,  in  the  same  instrument,  there  were 
both  alternately  anchored  items  (Part-of-the-Job,  Complexity)  and  items 
with  verbal  anchors  for  every  scale  point  (Number  supervised,  etc.). 
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The  PHPQ  was  used  to  gather  data  on  300  positions  in  45  companies, 
schools,  and  goveit*sent  agencies  throughout  the  country.    The  sample  of 
jobs  was  quite  diverse  and  salary  levels  ranted  from  about  $690  per 
month  for  an  administrative  assistant  to  over  $6800  per  month  for  an 
executive  vice  president  of  a  major  company.    About  250  cases  had 
complete  data  and  were  useable  in  the  various  types  of  analysis  planned 
for  the  study.    An  analysis  of  the  distribution  of  responses  by  item  was 
not  included  in  the  research  plan  but  in  the  course  of  displaying  some 
of  the  data  for  another  purpose,  it  was  noted  that  some  items  appeared 
to  have  non-normal  distributions.    This  led  to  displaying  the  data  in 
such  a  way  that  the  distribution  of  responses  by  scale  point  was 
visible.    Table  1  gives  a  partial  picture  of  this  data. 

The  items  at  the  top  of  this  table  are  those  with  alternately 
anchored  response  categories.    Items  at  the  bottom  have  a  verbal  anchor 
for  each  scale  point. 
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TABLt  1 

RESPC;;C£  DISTRIBUTIONS  FOR  A  SAMPLE  OF  ITEMS  FROM  THE  PMPQ 


ITEMS  RESPONSES 


0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

1,  Work  Scheduling  (P) 

5 

7 

2 

27 

11 

74 

15 

65 

18 

t 

29 

2.  Complexity  of  Work  Scheduling  (C) 

4 

10 

4 

52 

26 

85 

29 

26 

8 

7 

43.  Planning/Scheduling  (Sumniary  P) 

«x«  ^  ^ 

^     A  A 

2 

6 

3 

32 

19 

85 

21 

55 

10 

26 

67.  Fonnal  Education  Required 

■  0 

4 

15 

15 

27 

124 

6 

36 

12 

36 

87.  No.  of  Nonsupervisory  Personnel 

55 

45 

53 

28 

17 

11 

15 

16 

10 

10 

89.  Total  No.  of  Personnt! 

26 

37 

56 

36 

28 

16 

31 

13 

6 

1 
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You  will  note  that  for  the  alternately  anchored  items,  the  2,  4,  6,  and 

8  response  categories  are  consistently  lower  than  are  the  1,  3,  5,  7  and 

9  categories.  This  pattern  is  perhaps  even  more  visible  if  the  data  are 
plotted  as  histograms. 

Figure  1  gives  th^  distribution  of  responses  for  Item  1,  which  asks 
the  degree  to  which  an  incumbent  schedules  his  or  her  own  work  or  the 
work  of  others.    The  verbally  anchored  scale  points  are  indicated  in 
this  figure  by  cross  hatching  while  the  unanchored  scale  points^ are 
shown  blank.    You  can  see  that  all  response  categories  were  used  but 
that  there  is  a  marked  differential  in  response  between  the  anchored  and 
the  unanchored  scale  points. 

Figure  2  displays  the  distribution  of  responses  for  the  second  item 
in  the  PMPQ;  how  complex  arfe  the  work  scheduling  activities  of  the 
position?    Here,  the  anchored  scale  points  have  not  only  a  verbal  anchor 
but  also  have  one  or  two  benavioral  examples  to  concretely  reference  the 
level  of  complexity.    Again!,  there,  is  a  marked  differential  in  response 
frequency  between  anchored  jand  unanchored  response  categories. 

Figure  3  represents  data  from  Item  89,  which  asks  the  total  number 
of  personnel  in  units  under  the  supervision  or  management  control  of  the 
incumbent.    Here  all  response  categories  are  concretely  anchored  with  an 
interval;  for  this  item,  3  =  10  to  25  people.    As  you  ran  see  from  the 
distribution  of  responses  displayed  in  this  figure,  this  is  quite  a 
different  kind  of  distribution »    There  is  no  marked  difference  across 
adjacent  items  in  the  systematic  way  seen  in  Figures  1  and  2.  Thus, 
there  appear  to  be  very  major  differences  in  the  way  individuals  respond 
to  anchored  and  unanchored  rating  scales. 

We  have  not  yet  tested  to  see  if  these  are  signif^lcant  differences* 
Hopefully,  this  work  can  be  done  in  the  next  few  months  and  we  can  come 
to  a  more  coricrcte  conclusion.    When  this  is  done,  I  expect  that  we  will 
seek  to  publish  the  result  as  a  short  note  in  one  of  the  journals. 

For  the  present,  this  unexpected  result  has  led  me  to  question  the 
results  of  some  of  the  eavlier  research.    Would  the  results  of 
Hemphill's  landmark  study  of  executive  positions  have  been  the  same  had 
he  used  a  verbal  anchor  for  all  scale  points  rather  than  just  three 
anchors  across  seven  response  categories?    I^ould  Cragun  and  McCormick 
have  come  to  the  same  conclusions  if  they  had  used  a  Part-of-the-Postion 
scale  which  was  completely  anchore^J?    Of  course,  there  are  no  ready 
answers  to  these  questions.    We  have  not  yet  done  the  research  needed  to 
clarify  just  what  is  goiag  on  in  these  cases  nor  do  we  yet  have  any  idea 
of  the  impact  of  this  differential  response, phenonomenou  on  the  major 
findings  of  earlier  research. 

What  is  clear  is  that  this  is  a  phenomemcn  which  must  be  looked 
into;  we  need  to  learn  how  this  type  of  differential  response  tendency 
impacts  on  occupational  data  and  ultimately  on  management  decisions  made 
with  these  data.  1^7^ 
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Figure  1.    Distribution  of  responses  from  PMPQ  Item  1.  -  Work  Scheduling  (P) 
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Figure  2.    Distribution  Df  responses  for  Ztem  2.    -  Complexity  of  Scheduling 
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Personnel  in  Uriit::^  ^iroi^rvisec 


Total  Numbe 


577 


For  the  present,  we  must  assume  that  this  type  of  differential 
response  is  not  a  desirable  outcome  and  thus,  that  alternately  scaled 
items  should  be  avoided.    Until  more  is  known  about  the  impact  of 
variance  in  verbal  anchoring  such  scales,  scales  with  verbal  anchors  for 
each  response  category  should  be  used.    If  verbal  anchors  cannot  be 
developed  for  each  scale  point,  then  we  perhaps  should  use  a  semantic 
differential  with  anchors  only  at  the  end  points.    It  would  be 
interesting  indeed  to  see  how  our  results  would  vary  with  these 
different  anchoring  systems    this  is  an  area  which  really  could  benefit 
from  some  empirical  research. 
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SAMPLE  SIZE  AND  STABILITY  OF  TASK  ^ALYSIS 
nffVENTOHT  MSPONSE  SCALES 


John  J.  Pa3*s  and  David  W.  Robertson 
Navy  Personnel  Retjsearch  and  Developimenr  Center 
San  Dieqx'^   California  92152 


Problem 

Occupational    -ask  analysis  inventories  are  aiaministered  on  a 
recurring  iiasis  zim  hundreds  of  mouJ^r^inds  of  personnel  in  the  military 
services,    y^fhlle  rsze  collected  osata  .^e  used  by  management  for  the 
specification!  ai  oircxipational  s=and:fflxls,  the  design  of  training 
curricula,  zne^structuring  :=f  c-^pational  specialities,  the  data 

acquisitinrr  •pr^c^recbares  place  h^w^  xtme  demands  on  job  incumbents. 
Typical  im  ~r  aries  contain  beEweb.ji  800  to  1000  items  and  can  take 
over  four  r      -r  tro  administer.    Th^u.^.  the  problem  is,  how  to  minimize 
the  time  3  on  the  Fleet  whil  -  selecting  sample  sizes  and 

inventory-  asp  ise  scales  adequate  tc  obtain  stable  (that  is,  reliable) 
data. 


Objectiv- 

The  rhject  -ve  of  the  study         to  determine  empirically  the 
stabilit    and  :  ^dependence  of  re    imses  on  two  task  analysis  response 
scales~f^^  Tim-Spent  scale  anc    -he  Task-Performed  scale  (these  scales 
are  currin^y  xrr  use  by  the  mil;  ^  rj  services— they  will  be  defined 
subsequEsr-iy)  .     A  primary  concerrrx  vas  the  degree  of  change  in 
stabilirr-  as  saanple  size  varied. 


lETHOD 


Data 

Task  inventory  response  data  (Display  1)  were  provided  by  the 
Navy  Occupational  Development  and  Analysis  Center  (NODAC).    Four  Navy 
occupational  specialties  (termed  Ratings  in  the  Navy)  were  selected 
for  analysis;  that  is,  the  Aviation  Machinist's  Mate,  the  Electronics 


Paper  presented  at  the  20th  Annual  Conference  of  the  Military 
Testing  Association,  Oklahoma  City,  Oklahoma,  30  October  to  3  November,  1978 

The  opinions  and  assertions  contained  herein  are  those  of  the  i^nriters 
and  are  not  to  be  construed  as  official  or  reflecting  the  views  of  the 
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Technician,  the  Torpedoman's  Mace,  and  the  Yeoman.    These,  foor 
occupations  were  deemed  to  be  representative  of  a  broad 
occupational  requirements.    The  data  were  collected  from  job  incumbents 
In  a  wide  variety  of  both  Fleet  and  Shore  activities. 

Each  of  the  four  data  sets  provided  by  NODAC  was  ra«ioBAy  split  by 
paygrade  to  obtain  eight  pairs  of  independent  paygrade  sanplfis  .for 
paygrades  E2  ta  E9),  each  comprising  50  percent  of  the  p^grade  personnel 
in  the  respective  total  sample. 


d 


Inventory  Scale  Response  Data 

The  fundaiBEiEtal  task  analysis  data  collected  by  the  x-litary 
occupational  analysis  programs  are  responses  to  the  Time-3nent  scale 
(a  scale  developed  within  the  Personnel  Research  Laboratsory  or  the 
U.S.  Air  Force).    This  scale  is  a  Likert-type  scale  of  cLme  spent  per- 
forming a  task,  with  scale  points  ranging  from  "very  much'  through 
"average"  to  "very  little."    The  Navy  program  uses  a  flaie -point  Time- 
Spent  scale.    Other  military  services  use  a  seven  or  nine  point  Time- 
Spent  scale. 

The  Comprehensive  Occupational  Data  Analysis  Propnn*  (CODAP) ,  a 
programming  package  developed  and  upgraded  by  personnel  oi  the  Human 
Resources  Laboratory  of  the  Air  Force,  operates  on  the  ^J^^^pent 
responses  and  converts  these  data,  as  shown  in  D^^P^^J / ' ^°  "f 
on  a  Relative  Time-Spent  scale  and  to  responses  on  a  binary  ^ask  Perf 01 
scale  (where  a  score  of  1  indicates  the  task  is  perfoimied  and  0  indicates 
the  task  is  not  performed).    From  these  converted  resfionse  scores, 
ai^ragfscores  for  a  given  sample  are  derived  by  CODi^ 
an  inventory.    These  average  score  vectors,  called  jao  descriptions  or 
job  description  profiles  (Display  3),  contain  the  most  widely 
inalysis  information.    As  shown  in  Display  3    the  ^^^^^^WtaJ^ng 
is  the  percent  of  personnel  performing  each  task,  calculalsd  by  t^ing 
the  average  of  responses  on  the  T^«K-Performed  scale.    The  other  r--^ 
profiles  are  averages  of  responses  on  the  Relative  Time-Spent  scale 
The  profile  in  the  middle  of  the  Display  is  calculated  on  scores  for 
only  those  personnel  who  perform  the  task;  that  is,  personnel  with 
^  or  blank  Time-Spent  scores  are  not  included  ^^/^e  calculation 
of  these  averages.    All  the  personnel  are  Included  in  the  -^^^^'^'^ 
of  average  petcentages  of  Time-Spent  for  the  third  profile  shown.  The 
data  in  this  display  are  actual  data  derived  from  scale  responses  by 
paygrade  6  personnel  from  the  Torpedoman's  Mate  Rating. 

The  present  study  derived  these  three  profiles  for  the  randomly 
drawn  independent  paygrade  samples,  and  calculated  the  similarity 
between  each  profile,  based  on  several  indices,  across  samples  (Display  4) 
Of  prSary  interest  was  the  degree  of  similarity  of  the  profile  data 
for  corresponding  paygrade  samples  in  each  rating  as  ^^^^^^ 
X's  in  the  diagonal  of  the  display  matrix.    Since  the  job  description 
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profiles  are  averages       responses  on  either  the  Relative  Time-Spent 
or  the  Task-Performed  scjales,  the  degree  of  obtained  similarity  between 
corresponding  paygraries   mdicates  the  degree  of  stability  of  the 
responses  on  these  SLcaZ^j^^  . 

Stability  Indices 

Three  stability  tunrices  were  calculated  on  the  profile  data 
(Display  5).     All  trnree  reflect  the  stability  over  all  tasks  in  the 
profile  or  inventorry.    For  the  Product  Moment  (PM)  coefficient 
calculation,  proflLe  tasks  were  treated  as  cases,  and  percentages  as 
scores. 1    Essentially  this  coefficient  measured  the  stability  of  the 
relative  values  or  rrsak  order  of  inventory  tasks  in  terms  of  Relative 
Time-Spent  or  Task-Performed  percentages. 

The  other  two  iindlces  measured  the  stability  of  the  absolute  or 
actual  percentage  values  for  the  percent  performing  profile  only. 
These  indices  evaluated  the  difference  in  percentages  of  personnel 
performing  the  same  tasks  across  independent  paygrade  samples.  The 
percentage  of  invemtory  tasks  that  met  the  criteria  listed  on  Display  5, 
that  is,  not  exceeding  5  or  10  percentage  points  difference  or  not 
reaching  significamne,  was  the  value  for  the  particular  index.  Pairs 
of  zero  scores  on  corresponding  tasks  across  samples  were  not  included 
in  the  calculation  of  any  index.    The  obtained  values  for  certain  of 
these  indices  were  then  plotted  against  sample  size,  and  eta  coefficients 
were  calculated  to  measure  the  relationships.     A  computerized  curve 
smoothing  procedure  (ISSC,  1970,  pp.  11-7  to  11-9)  was  applied  to  the 
plots. 

Independence  of  Responses  to  the  Task-Performed  and  Time-Spent  Scales 

Using  the  same  correlational  model  previously  described,  the  Product 
Moment  coefficient  was  also  calculated  between  the  Percent  Performing 
profile  and  the  Average  Time-Spent  by  All  personnel  profile.  This 
analysis  was  performed  between  these  two  profiles  since  preliminary 
results  showed  marked  similarity;  that  is,  a  lack  of  Independence 
between  these  profile  data. 


IWith  this  correlational  model,  complete  independence  of  scores 
did  not  exist.     That  is,  the  same  individuals  provided  responses  for  - 
calculation  of  a  percentage  (i.e.,  score)  for  more  than  one  task. 
However,  Cragun  and  McCormick  (1967)  report  only  minor  inflation  for 
coefficients  derived  with  this  same  model  for  the  study  of  U.S.  Air  Force 
task  analysis  inventory  reliability. 
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RESULTS  AND  DISCUSSION 


For  this  presentation,  only  some  of  the  results  will  be  presented. 
A  technical  report  which  Includes  all  results  related  to  this  presen- 
tation and  which  also  includes  findings  on  the  relationship  between 
sample  size  and  cluster  solution  stability  is  in  preparation. 


Comparative  Stability  and  Independence  of  Responses  on  the  Task-Performed 
arid  Time-Spent  Scales 

The  stability  results  based  on  the  PM  coefficient  are  presented  in 
Display  6.    As  shown,  two  of  the  profiles  obtained  very  high  median 
coefficients,  but  the  profile,  calculated  on  Relative  Time-Spent  values 
for  only  those  personnel  who  perform  the  tasks,  obtained  relatively 
low  stability  values.    The  coefficient  vslues  for  the  other  two  profiles 
(i.e..  Percent  Performing  and  Average  Time-Spent  by  All)  were  not  only 
very  high  but  also  appeared  to  be  positively  related. 

This  apparent  relationship  was  investigated  to  determine  the  degree 
of  independence  between  profile  data  (Display  7).    The  very  high  correlation 
coefficients  obtained  between  the  Percent  Performing  and  Average  Time-Spent 
by  All  profiles  within  each  of  eight  paygrades  for  AD  and  TM  are  shown  in 
Display  7.    Similar  findings  on  the  lack  of  Independence  between  these  two 
profiles  are  reported  in  a  report  published  by  the  Human  Resources 
Laboratory  of  the  U.S.  Air  Force  (see  Carpenter,  1974).    Thus,  there  is 
little  difference  between  these  profiles  in  terms  of  distributional  shape 
or  rank  order  of  task  scores  (see  Cronbach  and  Gleser,  1953) .    The  use 
of  either  profile  in  determining  rank  order  of  tasks  will  yield  very 
similar  results. 

Next  the  magnitude  of  these  two  similar  profiles  was  examined.  The 
percentages  for  the  average  Time-Spent  by  All  profile  are  extremely 
small  in  value.    The  data  in  Display  3  are  sorted  from  high  to  low  on 
the  basis  of  this  profile's  values.    As  shown,  1.98  percent  is  the 
largest  score  for  the  337  TM  Inventory  tasks  for  this  sample  of 
paygrade  6  personnel.    Typically,  values  on  this  profile  for  all 
paygrade  samples  analyzed  were  below  1  percent,  that  is,  an  average 
of  less  than  1  percent  of  time  was  spent  performing  any  tas:;.  The 
magnitude  of  these  values  make  Interpretation  difficult.  Parenthetically, 
small  values  were  also  typical  for  the  other  Time-Spent  profile. 
Furthermore,  Navy  users  surveyed  reported  little  or  no  use  of  the  Time- 
Spent  data.    On  the  other  hand,  the  percentages  of  personnel  performing 
tasks  appear  meaningful  as  well  as  being  higjily  stable. 

Other  studies  Indicate  additional  problems  with  Time-Spent  data; 
specifically,  a  less  favorable  reaction  by  job  Incumbents  to  using  the 
Time-Spent  scale  as  compared  to  other  task  analysis  scales  (see  Cragun 
and  Mccormick,  1967),  a  substantial  amount  of  time  needed  to  mark  tasks 


on  the  Time- Spent  scale  (estimated  to  be  about  2,5  hours  for  450  items 
out  of  the  800  to  1000  items  in  a  typical  inventory  [Cragun  and 
McConnick,  1967]),  as  well  as  inconsistent  conclusions  drawn  in  regard 
to  the  scale's  validity  (see  Hartley,  Brecht,  Pagery,  Weeks,  Chapinis, 
and  Hoecker,  1977  versus  Carpenter,  Giorgia,  and  McFarland,  1975; 
also  see  McCormick,  in  Dunnette,  1976,  p.  670).    Hartley  et  al.  (1977) 
do  report  substantially  valid  rank  ordering  of  tasks  by  job  incumbents 
in  terms  of  time  spent •    In  comparison  to  the  Time- Spent  data,  the 
Percent  Performing  profile  based  on  Task-Performed  data  is  highly 
stable  and  is  used  regularly  by  consumers  of  task  analysis  information. 
Thus,  these  data  were  selected  to  examine  in  relation  to  sample  size. 

Sample  Size  and  Stability  of  Responses  to  the  Task-Performed  Scale 

Display  8  plots  the  relationship  between  the  correlational  stability 
index  calculated  on  the  Percent  Performing  data  against  sample  size.  For 
comparability  with  other  plots,  correlation  values  were  multiplied  by 
100  before  plotting.    As  stated  before,  this  index  reflects  the  stability 
of  the  rank  order  of  tasks  for  these  Task-Performed  data.    The  clearly 
asymptotic  curve  indicates  high  stability  of  data  for  sample  size 
exceeding  about  30  and  extremely  high  stability  when  the  sample  exceeded 
about  100,     This  curve  shows  minimal  improvement  in  stability  for  increases 
in  sample  size  above  about  40, 

Display  9  shows  two  curves  which  plot  the  percentage  of  inventory 
tasks  that  did  not  exceed  a  difference  across  samples  of  either  10  or 
5  percentage  points.     Curve  1  is  clearly  asymptotic  and  indicates  high 
stability  for  sample  size  exceeding  about  30  and  extremely  high 
stability  when  the  sample  exceeded  about  100,     Curve  2,  reflecting  the 
more  rigorous  criterion  level,  indicates  very  high  stability  at  N  above 
240,  and  moderate  stability  at  sample  size  above  100,     The  eta  coeffi- 
cients were  ,76  and  ,88  (P  5  ,01,  df=5,  26,  see  Hays,  1963,  formula 
16,6,4)  for  Curve  1  and  2,  respectively,  which  indicate  a  substantial, 
highly  significant  relationship  between  sample  size  and  stability. 

Examination  of  the  curves  in  relation  to  each  other  reveals 
additional  information.     First  of  all,  the  curve  based  on  the  correla- 
tional index  is  highly  similar  to  the  curve  based  on  the  10  percent 
level.     Thus,  for  interpretations  of  the  data  for  sample  size  above 
about  40,  Curve  1  in  r   >play  9  can  be  considered  to  also  represent  the 
curve  in  Display  8  based  on  the  correlational  index. 

Curve  2  in  Display  9  intersects  a  stability  value  of  about  75 
(that  is,  75  percent  of  inventory  tasks  across  samples  differed  by 
less  than  5  percentage  points)  at  sample  size  of  about  100,  The 
question  as  to  the  stability  (or  amount  of  difference  obtained)  for 
the  remaining  25  percent  of  inventory  tasks  is  answered  by  examining 
the  value  at  which  Curve  1  intersects  the  stability  dimension  for  the 
same  sample  size  of  100.    The  value  shown  is  about  97  percent  and 
indicates  that  of  the  remaining  25  percent  of  inventory  tasks,  all 
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but  3  percent  differed  by  10  or  less  percentage  points.    For  another 
example  of  information  gained  by  comparing  curves,  at  sample  size  ot 
80.  Curve  2  intersects  the  stability  dimension  at  a  moderate  score  of 
about  70,  but  Curve  1,  when  considered  the  same  as  the  curve  based  on 
the  correlation  index  (Display  8) ,  Intersects  at  about  95  (that  Is^ 
a  correlation  coefficient  equal  to  about  .95).    Thus,  for  this  sample 
size  of  80  the  relative  values  or  the  rank  order  of  all  the  tasks  In 
the  inventory  are  (Is)  highly  stable  In  terms  of  percentages  of 
personnel  performing  those  tasks. 

Display  10  shows  data  from  all  three  curves.    As  shown,  sampling 
beyond  an  N  of  240  would  produce  very  little  gain  even  In  terms  of 
the  most  rigorous  stability  criterion.    And  If  only  the  rank  order  of 
tasks  In  terms  of  numbers  of  people  performing  them  Is  required,  a 
sample  size  of  100  or  even  40  would  be  acceptable.    Consideration  of 
available  personnel  and  the  Information  displayed  resulted  in  a 
recommended  total  sample  size  of  about  1400,  or  45  percent  less 
personnel  th^^  the  collected  data  for  the  AD  Rating.    A  similar 
sample  size  was  indicated  for  the  ET  Rating;  that  is,  a  sample 
containing  about  1000  less  personnel  than  in  the  existing  total 
sample.    Examination  of  the  total  sample  sizes  for  about  36  Ratings 
reveals  some  oversampllng  for  about  one-fourth  of  the  Katlngs.  On 
the  other  hand,  an  additional  115  personnel  to  add  to  the  total 
sample  of  TM  personnel  was  indicated  by  the  findings.    The  application 
of  these  guidelines  will  enable  more  cost-effective  sampling  (es- 
peclally  realized  for  the  larger  Rating  populations)  and  assure  overall 
stab'''.ity  of  results. 

.hould  be  noted  that  the  utility  of  these  obtained  relation- 
al ~.  .        .es  on  the  degree  of  representativeness  of  the  samples 

;  .nd  those  to  be  inventoried.    Assuring  a  representative  sample 
co^ic:  '^^naire  increasing  sample  size  above  that  indicated  by  the  study  s 
euldpH;-s.    Other  factors  such  as  availability  of  personnel,  and  sub- 
groups of  special  Interest,  must  also  be  considered  in  determining 
sample  size.    One  other  possible  limitation  concerns  the  g^^^^lj^y 
of  these  findings  to  other  Ratings  and  to  other  types  of  occupational 
specialties.     It  is  reasonable  to  expect  the  findings  to  apply  to 
occupational  specialties  judged  to  be  as  homogeneous  as  (or  more 
homogeneous  than)  paygrades  within  a  Rating. 


CONCLUSIONS 


Based  on  the  study's  findings  and  current  task  analysis  procedures, 
it  is  concluded  that  (Display  11): 

1     To  substantially  reduce  administration  time,  the  Time-Spent 
scale  ^an  be  deleted  from  future  task  -jlysls  inventor  es  without  loss 
of  practical  information.    Alternate  methods  of  estimating  time  spent, 
including  incumbent  ranking  of  the  most  time  consuming  tasks,  could  be 
administered  on  a  trial  basis. 

5?f: 
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2.  Responses  to  currently  administered  inventory  scales  could 
be  used  to  calculate  the  percentage  of  incumbents  performing  tasks — 
CODAP  modification  is  not  essential, 

3.  The  study's  empirically  developed  guidelines  on  sample  size 
required  for  stable  data  can  be  used  an  an  aid  to  determine  cost-- 
effective sample  sizes  that  optimize  stability. 
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DISPLAY  1 
TASK  ANALYSIS  SURVEYS  FOR 
 FOUR  NAVY  RATINGS  ANALYZED  

Rating  Inventory  Size 

Abbre-      Total  Total  Sam- 
Title          viATioN      Items    Tasks  ple  Size 

Aviation  Machinist's  Hate  AD        1163      m  2538 

Electronics  Technician      ET        1080      597  25^(8 

Torpedoman's  Hate          TN     .    782      337  735 

Yeoman                   YN         810      529  2771 


DISPLAY  2 

FUNDAMENTAL  TASK  ANALYSIS  DATA: 
RESPONSES  TO  THE  TIME-SPENT  SCALE 


Job  Incumbent 


Time-Spent  Relative  Time-  Task-Performed 

Response  Spent  Response  {%)   Response — 

0  0  0 

2  20  1 
5  50  1 

0.  0  0 

3  30  1 

io"  100% 
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DISPLAY  3 

DATA  ANALYZED:    CODAP  JOB  DESCRIPTION  PROFILES  DERIVED  FROM 
RESPONSES  ON  THE  RELATIVE  TIFIE-SPENT  AND  TASK-PERFORFIED  SCALES 


Job  Description  Profile 


Percent  Average  Time-    Average  Time- 


Task 

Performing  (%) 

Spent  (%) 

Spent  By  All  (%) 

1 

82.i|2 

IM 

1.98 

2 

90.11 

1.95 

1.76 

7 
J 

1  96 

1.^6 

72.52 

1.87 

1.35 

5 

67.03 

1.90 

1.28 

6 

63.7^ 

1.77 

1.15 

7 

61.5^ 

1.59 

.98 

8 

6^.83 

1.50 

.97 

9 

63.7^ 

1.51 

.96 

10 

59.3i| 

1.60 

.9^ 

DISPLAY  ^ 
DETERMINATION  OF  STABILITY  BASED  ON 
COMPARISONS  OF  JOB  DESCRIPTION  PROFILES 
DERIVED  FOR  INDEPENDENT  PAYGRADE  SAMPLES 


Sample  A  Paygrade 


Sample  B 

Paygrade  ^2    E3    E^l    E5    E6    E7    E8  E9 

E2 
E3 
Ei| 
E5 
E6 
E7 
E8 
E9 
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DISPLAY  5 

STABILITY  INDICES  CALCULATED  ON  JOB  DESCRIPTION  PROFILES 
ACROSS  INDEPENDENT  PAYGRADE  SAMPLES 


Relative  Value  (Rank  Order)  Stability 


1.     PM  CORRELATION  COEFFICIENT 

Absolute  (Actual)  Value  Stability 


1.  Percentage  of  corresponding  profile 

TASKS  that  do  NOT  EXCEED: 

a.  5  percent  difference 

b.  10  percent  difference 

2.  Percentage  of  corresponding  profile 

TASKS  that  are  NOT  SIGNIFICANTLY 
DIFFERENT  (Z  TEST) 


507 

549 


DISPLAY  6 

COMPARATIVE  STABILITY  OF  JOB  DESCRIPTION  PROFIL 
BASED  ON  ?n  CORRELATION  COEFFICIENT 

Median  Correlation  Coefficient  Across 
Corresponding  Paygrades  (E2-E9) 


Rating 

Percent 
Performing 

Average  Time- 
Spent 

Average  Time- 
Spent  By  All 

AD 

.98 

.33 

.96 

.98 

•  .50 

.96 

TP! 

.90 

.32 

.92 

YN 

.97 

.31 

.96 
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DISPLAY  7 

PM  CORRELATION  BY  PAYGRADE  BETWEEN  PERCENT  PERFORMING 
AND  AVERAGE  TIFIE-SPENT  BY  ALL  PERSONNEL  PROFILES 


Paygrade 

Rating      E2      E3      E^      E5     E6      E7      E8  E9 


AD         9^  96  97  97  93  96  96  72 

(67)  (M)  (282)  (337)  (281)  (108)  (31)  (1^) 

TM         92  9^  96  96  9^  90  92  83 

(08)  (29)  (66)  (125)  (92)  (36)  (10)  (02) 


NOTE:    Number  in  parenthesis  is  number  of  personnel  in  paygrade 

SAMPLE. 


DISPLAYS 


DISPLAY  9 


DISPLAY  10 
SAMPLE  SIZE  EFFECT  ON  STABILITY  OF 
PERCENTAGES  OF  MEMBERS  PERFORMING  INVENTORY  TASKS 


Actual  Value  Relative  Value 

Stability  (Rank  Order)  Stability 


Paygrade        5  Percent  10  Percent 

Sample  Size  Level  Level  Correution 

10  SSI  m  .90 

100  75%  m  .97 

210  91Z  lOOZ  .99 

310  962  1002  .99 

110  99%  lOOZ  .99 


ERIC 


€00 
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DISPLAY  11 
CONCLUSIONS 


1.  The  Time-Spent  Scale  can  be  deleted  to  reduce 
inventory  administration  time 

2.  Responses  to  currently  administered  Inventory 
Scales  could  be  used  to  calculate  percent 
performing  data 

3.  The  study's  empirically  developed  guidelines 
can  be  used  as  an  aid  to  determine  optimal 
sample  size 
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BENCHMARK  SCALES  FOR  COLLECTING  TASK  TRAINING  FACTOR  DATA 


By 

David  C.  Thomson 
and 

Kenneth  Goody 
Occupation  and  Manpower  Research  Division 
Air  Force  Human  Resources  Laboratory 
Brooks  AFB,  Texas 


Introduction 


The  Occupation  and  Manpower  Research  Division  of  the  Air  Force 
Human  Resources  Laboratory  (AFHRL)  is  engaged  in  research  into  an 
advanced  methodology  for  determining  task  training  priorities  (Christal, 
1970;  Mead,  1976).    One  element  of  this  research  is  the  development 
of  benchmark  scales  for  measuring  task  factors  that  contribute  t 
training  priority  decisions.    The  type  of  benchmark  scale  employed 
is  a  9-point  scale  on  which  each  level  is  represented  by  three  typical 
tasks,  drawn  from  a  large  number  of  specialties,  that  illustrate  that 
level.    Scales  have  been  developed  for  three  task  factors.    They  are: 
Probable  Consequences  of  Inadequate  Performance,  Task  Delay  Tolerance, 
and  Task  Difficulty.    In  all,  three  series  of  scales  have  been  developed, 
one  for  specialties  with  an  Administrative  or  a  General  (A/ G)  aptitude 
requirement,  the  second  for  specialties  with  an  Electronic  (E)  aptitude 
requirement,  and  the  third  associated  with  a  Mechanical  CM)  aptitude 
requirement. 

At  Annex  A  is  an  example  of  one  of  the  nine  scales  developed 
and  validated  over  the  last  two  years.    The  development  phase  of  such 
a  scale  has  been  fully  documented  and  reported  by  Goody  CPsychology 
in  the  USAF  Symposium  Apr  76),  Goody  and  Watson  (MTA  in  Oct  76)  and 
Goody  (AFHRL  Technical  Report  76-15).    This  paper  will  not  repeat 
the  description  of  the  development  phase  of  the  scales,  but  will  address 
the  field  testing  of  the  scales,  their  use  and  future  research  areas. 

Background 

The  benchmark  scales  were  conceived  as  a  means  to  permit  measurement 
of  task  factors  against  common  frames  of  reference  for  various  specialties. 
It  was  envisaged  that  a  limited  number  of  regression  equations  using 
benchmark  scale  task  factor  data  could  be  computed,  each  applying 
across  a  number  of  specialties,  that  would  predict  task  training  priorities. 
Task  factor  scales  to  date  have  been  of  a  relative  nature,  in  that 
the  ratings  given  on  a  task  were  dependent  on  the  nature  of  the  other 
tasks  in  the  specialty.    While  such  ratings  can  be  used  to  predict 
task  training  priority  within  a  specialty,  a  new  regression  equation 
must  be  computed  for  each  specialty. 
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Field  Testing  of  Benchmark  Scales 

The  benchmark  scales,  as  developed,  were  field  tested  by  comparing 
the  relative  scales  rating  data  with  the  benchmark  scales  rating  data 
over  at  least  two  specialties  for  each  of  the  nine  sca.les.    The  extensive 
range  of  the  study  is  tabulated  in  Table  1,  which  shows  the  specialties 
and  sample  sizes  used  in  the  testing. 

Supervisors,  randomly  drawn  from  each  of  the  specialties  listed, 
were  asked  to  rate  their  own  career  ladder  inventories  on  a  single 
task  factor  using  either  the  relevant  benchmark  scale  or  the  relative 
scale.    Using  standard  techniques,  raters  were  deleted  if  their  task  , 
means  were  significantly  (p<.01)  divergent  from  the  sample  task  means, 
this  fixed  selection  rule  being  applied  to  each  rater  and  each  sample. 
Comparison  of  benchmark  and  relative  sample  sizes  and  percentage  deletions 
of  divergent  raters  could  now  be  made  and  a  conclusion  drawn  about 
the  relative  efficiency  of  the  scales.    These  results  are  shown  in 
Table  2. 

The  next  step  in  the  analysis  of  the  data  was  to  standardize 
the  interrater  reliability  coefficients  so  as  to  suppress  the  effect 
of  rater  response  set  and  permit  direct  comparisons  of  the  reliabilities. 
The  significance  test  used  was  that  developed  by  Haggard  (1958).  The 
test  requires  conversion  of  the  reliability  coefficients  into  Z  scores 
and  then  a  significance  test  on  the  difference  between  the  relevant 
benchmark  and  relative  scale  Zs.    Results  of  those  tests  are  tabulated 
in  Table  3. 

Finally  to  test  whether  raters  using  the  benchmark  scales  converge 
on  the  same  vector  as  they  do  using  the  relative  scales,  the  benchmark 
raw  vectors  of  task  means  were  correlated  with  the  corresponding  relative 
scale  raw  vectors  of  task  means.    Pearson  correlation  coefficients 
are  tabulated  in  Table  4. 

Findings 

Although  raters  using  the  benchmark  scales  have  to  use  technical 
knowledge  outside  their  past  and  current  job  experiences,  it  was  found 
that  on  the  average  ouly  10%  of  those  raters  had  to  be  deleted  compared 
with  an  average  16%  of  each  sample  of  raters  using  the  relative  scales. 
This  significant  difference  in  percentage  rater  deletions  implies 
that  by  using  benchmark  scales, ' generally  smaller  samples  can  be  used, 
with  the  associated  cost  savings  benefits,  to  achieve  equally  good 
reliabilities. 

At  a  probability  of  0.05,  the  benchmark  rater  agreement  coefficients 
are  significantly  higher  than  the  relative  rater  agreement  coefficients 
in  14  comparisons,  not  significantly  different  in  10  comparisons, 
and  significantly  lower  in  3  comparisons.    Investigation  of  these 
later  3  cases  showed  that  the  raters  were  not  sufficiently  familiar 
with  the  tasks  on  particular  benchmark  scales  to  be  able  to  make  reliable 
ratings.     Future  research  needs  to  address  this  question  as  to  which 
subsets  of  raters  are  sufficiently  knowledgeable  to  be  able  to  reliably 
use  the  benchmark  scales. 
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Of  the  27  Pearson  correlation  coefficients,  only  nine  are  below 
.85  and  of  these  only  two  are  below  .72.    In  those  two  cases,  low 
relative  scale  Interrater  reliabilities  contributed  strongly  to  the 
poor  correlations.    But  as  high  correlations  were  generally  found 
to  be  the  order.  It  can  be  concluded  that  raters  using  the  benchmark 
scales  do  rank  the  tasks  In  the  same  order  as  raters  using  the  traditional 
relative  scales. 


Future  Research 

Although  not  discussed  In  depth  In  this  paper,  the  problem  of 
raters  not  being  sufficiently  familiar  with  the  tasks  listed  on  the 
benchmark  scales  does  exist.    Future  research  needs  to  address  this 
problem.    One  way  around  the  problem  Is  to  accept  the  technology  for 
what  It  Is,  and  develop  benchmark  scales  to  address  questions  across 
specialties  In  a  limited  number  of  similar  specialties  (e.g.  aircraft 
systems  maintenance)  such  as  exist  in  a  career  field. 

There  are  some  indications  in  the  research  data  that  supervisors, 
using  the  benchmark  scales  and  rating  their  own  career  ladder  inventory 
tend  to  Inflate  their  ratings.    That  is  they  tend  to  indicate  that 
the  task  difficulty  is  higher  than  it  really  is,  that  the  acceptable 
delay  before  a  task  must  be  performed  is  smaller  than  it  really  is, 
and  that  the  consequences  of  not  doing  a  task  properly  are  much  worse 
than  they  really  are.    Furthermore,  this  inflation  does  not  appear 
to  be  constant  or  even  predictable.    Research  must  address  and  solve 
this  problem  before  task  factor  comparisons  across  large  numbers  ot 
specialties  can  be  made.    Developing  special  benchmark  scales  for 
use  within  career  fields  may  help,  since  the  problems  of  rater  inflation 
and  raters  being  unfamiliar  with  tasks  on  the  scales  should  be  less. 

Conclusion 

Benchmark  scales  will  allow  experienced  raters  to  provide  better 
intprrater  agreement  than  do  the  relative  scales  and  the  desired  level 
of  stability  of  the  means  is  obtained  more  efficiently  as  fewer  rater 
deletions  are  necessary.    Furthermore,  these  " 
the  correct  tank  ordering  of  the  tasks  on  the  different  task  factors. 
However,  considerable  effort  and  care  is  needed  to  ^J^.^^J^ 
the  Intended  raters  have  a  reasonable  amount  of  familiarity  with  the 
tasks  that  define  the  various  points  on  each  benchmark  ^"J^' 
inflation  of  ratings  for  the  raters'  o-^  specialty  should  be  expected. 
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Table  1 


NUMBER  OF  RATERS  BY  FACTOR  AND  TYPE  OF  RATING  SCALE 
FOR  3.1  AFS  IN  FINAL  VALIDATION  STUDY 


."ir  Force  Specialty 


Minimum 

Aptitude 

Requirement 


293X3  Radio  Operator 

651X0  Procurement 

531X5  Non  Destructive 
Inspection 

906XJ  Medical  Adminis- 
tration 


304X4  Ground  Radio 

Communication  Equip 


A60 
A70 
G50 

G60 

E80 


304X0 
423X4 
552X5 
423X1 

427X5 
631X0 


Radio  Relay  Equip  E80 

Pneudraullc  Repair  E,  M40 

Plumbers  M40 

Environmental  M40 
Systems 

Airframe  Repair  M40 

Fuel  Speclaltlsts  G,  M40 

Average 


Number  of  Raters 


Consequences 
Bench.  Relative 


51 
67 
61 

77 

66 

39 
60 

69 
52 

71 
71 
62 


45 
61 


105 
60 
35 

82 
33 

63 
61 


Delay  Tolerance 
Bench.  Relative 


49 
71 
67 

87 

57 

39 
71 
66 
52 

77 
71 
64 


50 
63 


104 
58 
50 

62 
34 

65 
61 


Task  Difficulty 
Bench.  Relative 

78 


49 
59 
67 

101 

55 

44 
69 
69 
52 

74 
74 
65 


101 
55 

78 

122 

* 

116 
77 

75 

75 
85 


6^ 
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Table  2 

PERCENTAGE  OF  RATERS  DELETED 


Specialty/Aptitude 

f  Conseauences 
Benchmark  Relative 

Delay  Tolerance 
Benchmark  Relative 

Task  Difficulty 
Benchmark  Relative 

293X3  Radio  Operator  A60 

8  20 

12 

24 

16 

24 

651X0  Procurement  A70 

12  5 

17 

19 

5 

22 

906X0  Medical  Administration  G60 

3  5 

5 

5 

2 

6 

531X5  Non  Destructive  Inspection  G50 

7 

25 

- 

6 

16 

]M  Ground  Radio  Communication  Equipment  E80 

5  7 

14 

16 

0 

11 

304X0  Radio  Relay  Equipment  E80 

5  3 

0 

20 

2 

11 

423X4  Pneudraulic  Repair  E,  M40 

10 

11 

- 

10 

11 

552X5  Plumbers  M40 

16  6 

28 

27 

3 

34 

423X1  Environmental  Systems  M40 

6  6 

10 

29 

6 

14 

427X5  Airframe  Repair  M40 

20  5 

21 

20 

11 

28 

631X0  Fuel  Specialists  G,  M40 

20 

11 

7 

25 

Average  Percentage  Raters  Deleted 

10  7 

14 

20 

6 

18 

Average  Final  Sample  Size 

52  54 

51 

46 

56 

53 

Table  3 


COMPARISON  OF  STANDARDIZED  l]x  VALUES  DERIVED  FROM  BENCBMARK  AND  RELATIVE  SCALE  DATA 

Benchiark  Relative  Nuaber  I 

Specialty/Task  Factor  Standard.  Standard.  Benchmark  Mative  Bendi.  Relat,  of 

Rll        R^^  K       K  Tasks 

TASK  DIFFICULTY 

293X3  Radio  Operator 
651X0  Frocurenient 


304X0  Radio  Relay  Equipment 
423X4  Paeudraiilic  Repair 
552X5  Plumbers 
423X1  Environmental  Systems 
427X5  Airframe  Repair 
631X0  Fuel  Specialists 

cn  CONSEQUENCES  OF  INADEIJUATE  PERFORMANCE 

IV) 

293X3  Radio  Operator 
651X0  Procurement 


^57 
•  jji 

221 

16.6 

11.9 

28.0 

38.5 

345 

3.02 

<.05 

.395 

.321 

36.3 

30.2 

54.1 

61.7 

328 

1.65 

.10 

.342 

.185 

30.8 

•  10.5 

57.3 

42.0 

230 

8,03 

<.05 

.460 

.428 

80.3 

35.2 

93.1 

45.8 

813 

11.67 

<.05 

.412 

.335 

35.5 

32.9 

49.3 

63.3 

730 

1.03 

.30 

.394 

.259 

26.9 

22.7 

39.7 

62.1 

322 

1.49 

.14 

.292 

.297 

23.9 

23,3 

55.4 

52.8 

575 

.28 

.78 

.333 

.310 

33.1 

34.1 

64.3 

73.5 

407 

-  .30 

.76 

.280 

.312 

17.9 

23.5 

43.4 

49.7 

736 

-3.69 

<.05 

.357 

.302 

36.1 

22.1 

63.1 

48.7 

252 

3.85 

<.05 

.345 

.340 

35.7 

22.5 

65.7 

41.8 

374 

4.40 

<.05 

552X5  Plumbers 
423X1  Environmental  Sy 
427X5  Airframe  Repair 

TASK  DEUY  TOLERANCE 


293X3  Radio  Operator  .370  .325 

651X0  Procurement  -276  .246 

906X0  Medical  Administration  .258  .251 

304X4  Ground  Radio  Com.  Equip.  .282  ,129 

304X0  Radio  Relay  Equipment  .283  .268 

552X5  Plumbers  -168  .155 

423X1  Environmental  Systems  .243  .217 

rn?r>-5  Mrarae  Repair  .330  .246 


.369 

.314 

20,8 

16.1 

34.0 

33.0 

345 

2.34 

<.05 

.222 

.217 

16.4 

16.7 

53.9 

56.6 

328 

-,18 

.86 

.230 

.258 

22.5 

34.2 

72.0- 

95.4 

813 

-5.91 

<.05 

.281 

.265 

23.6 

20.7 

57.7 

54.6 

730 

1.75 

.80 

.403 

.247 

25.4 

11.7 

36.3 

32.8 

322 

6.82 

<.05 

.159 

.265 

11.7 

27.3 

56.9 

72.8 

407 

-8.43 

<.Q5 

.305 

.277 

20.5 

11.5 

44.4 

27.4 

736 

7.74 

<.05 

.382 

.274 

35.4 

23.4 

55.6 

59.3 

252 

3.24 

<.05 

19.9 

15.5 

32.2 

30.0 

345 

2.27 

<.05 

22.4 

17.0 

56.1 

48.8 

328 

2.49 

<.05 

28.9 

30.8 

80.2 

89.0 

813 

-  .91 

.37 

19.1 

7.7 

46.0 

45.5 

730 

12.09 

<.05 

15.3 

14.4 

36.4 

36.6 

322 

.56 

.57 

10.9 

9.0 

49.1 

43.5 

407 

1.92 

.05 

14.5 

6.8 

42.1 

20.9 

736 

10.07 

<.05 

2U 

17.7 

57,5 

51.3 

252 

3.93 

^<,05 

Irane  tepalt  .330       ,  246        2M      ll.l  Ji-i  ^ 

►  m  i  .  610  f 


Table  4 

PEARSON  CORRELATION  COEFFICIENTS 


Specialty/Aptitude 

Consequences 

Delay  Tolerance 

Task  Difficult 

293X3  Radio  Operator  A60 

.91 

.92 

.59 

651X0  Procurement  A70 

.89 

.91 

.93 

906X0  Medical  Administration  G60 

.85 

.94 

.94 

531X5  Non  Destructive  Inspection  G50 

- 

- 

.73 

30^X^  Ground  Radio  Comunications  Equipment  E80 

.92 

.73 

.92 

304X0  Radio  Relay  Equipment  E80 

.87 

.90 

.78 

423X4  Pneudraulic  Repair  S,  M40 

- 

- 

.82 

552X5  Plumbers  M40 

.82 

.47 

.89 

423X1  Environmental  Systems  M40 

.94 

.72 

.92 

421X5  Airframe  Repair  M40 

.81 

.85 

.93 

631X0  Fuel  Specialists  G,  M40 

.91 

TASK  DE  LAY  TOLERANCE  — — - 
(ElMtronic) 

DEFINITION 

The  Task  May  Toleivrioe  of  a  task  it  a  measure  of  how  much  delay  can  be  tolerated  between  the  time  an  airman  becom« 
aware  the  task  is  to  be  performed  and  the  time  he  must  commence  doing  it. 

BENCHMARK  SCALE 
Lewi  9  ~  Mort  Tolerance  of  Peiay  —  Do  when  ready 

Clean  or  paint  missile  fccilities  or  equipment  (Mis<ile  Systems  Maimenance  Specialist) 
Wash,  dean  or  inspect  maintenance  vehicles  (Flight  Facilities  Equipment  Specialist) 
Write  test  questions  <Avionic  Inenial  and  Radar  Navlgntion  Systems  Specialist) 

Levels 

Revise  technical  orders  or  indices  (Weather  Equipment  Repairman) 

Inventory  bench  stock,  equipment  or  supplies  (Flight  Fadlittes  Equipment  Specialist) 

Maintain  electrical  storage  battery  records  (Telephone  Switching  Equipment  Repairman) 

Level  7 

Clean  parts  or  oompor>ents  using  solvents  ( Avionic  Navigation  Systems  Specialist) 

Locate  part  or  stock  numbers  in  federal  supply  catalogs  (Precision  Measurement  Equipment  Laboratory  Spedalistl 
Prepare  or  mainuin  Explosive  Ordinance  Disposal  reports  (Munitions  Disposal  Specialist) 

Level  6 

Change  oil  in  antenna  drive  assemblies  (Air  Traffic  Control  Radar  Repeirman) 
Analyze  computer  logic  diagrams  (Electronic  Computer  Systems  Repeirman) 
Trace  underground  power  cables  using  cable  test  set  (Electrical  PovMr  Line  Specialist) 

Levels 

Tighten  bolts  or  nuts  to  specified  torques  (Missile  Systems  Analyst  Specialist) 
TrouMeshoot  aircraft  radio  swKching  systems  (Avtonic  Communications  Specialist) 

Perform  operational  tests  on  af>gle-of-attack  or  sideslip  transmitters  (Integreted  Avionics  Component  Specielist) 
Level  4 

Test  or  check  safety  devices  such  as  valves,  regulators,  or  alarms  on  biomedical  equipment  (Biomedical  Equipment 
Maintenance  Repairman) 

Load  nudeer  bombs,  warheads  or  reentry  vehicles  onto  transport  aircraft  (Nuclear  Weepons  Specialist) 
Repair  or  adjust  aircraft  cockpit  latches  or  locks  (Aircrew  Egress  Systems  Repairman) 

Level  3 

Perform  inflight  anelysis  of  malhjnctions  in  automatic  tracking  radar  (Auto  Tracking  Radar  Repairman) 
Target  or  retarget  guided  miniles  (Missile  Systems  Analyst  Spedelist) 
Install  nudear  weapon  fusing  systems  (Weapons  Mechanic) 

Level  2 

Perform  nudear  bomb  safety  checks  (Nuclear  Weapons  Specialist) 
Monitor  aircraft  engine  instruments  during  flight  (Flight  Engineer  Spedelist) 
Check  aircraft  for  armament  safety  (Weapons  Control  Systems  Mechanic) 

Level  1  -  LeMt  Tolerance  of  Delay  -  Must  do  immediately 

Conduct  emergency  shutdown  of  missile  launch  fadltty  (Missile  Systems  Analyst  Specialist) 
Render  aircraft  emergency  egress  systems  safe  after  crash  (Aircrew  Egress  Systems  Repairman) 
Perform  emergency  shutdowns  of  high  pressure  boilers  (Phnt  Operator) 
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WEIGHTED  SELECTION  SYSTEM  FOR  AFROTC  APPLICANTS  ~ 
PERSPECTIVE  AFTER  SECOND  YEAR  OF  USE 

Lleutanant  Colonel  David  K.  Jackson 
Mr.  M.  Meriwether  Gordon,  Jr. 


At  last  year's  meeting  of  the  Military  Testing  Association,  AFROTC 
representatives  reported  on  the  "Development  of  a  Weighted  Selection 
System"  for  admitting  applicants  into  the  AFROTC  Professional  Officer 
Course.    This  course  is  the  last  two  years  of  the  four-year  AFROTC 
Pregri'ja  and  leads  to  an  Air  Force  coiunission  on  graduation. 

The  weighted  system  has  come  to  be  known  as  WPSS  (pronounced 
WEEP-us)  standing  for  Weighted  POC  Selection  System.    The  system  was 
developed  on  the  basis  of  the  findings -of  a  model  selection  board  held 
at  Maxwell  AFB.    With  the  assistance  of  Human  Resources  Laboratory,^ 
statistical  "policy  capturing"  techniques  were  applied  to  the  board's 
tindings.    xne  variaDxes  coriSxuereu  oy  uuc  uucixvi  j.n  roiiw  v**.^*^*.***©  re- 
cants for  the  program  were  processed  through  a  system  known  as  "Hier- 
archical Grouping"  and  assigned  weights  in  accordance  with  the  contri- 
bution each  made  to  the  individual' s  rank-order.    About  ninety  variables 
were  considered  and  reduced  to  eleven  which  were  identified  as  contri- 
buting significantly.    Those  variables  and  their  weights  together  with 
the  number  of  points  each  contributes  to  the  total  Quality  Index  Score 
(QIS)  derived  by  the  system  are  listed  in  Table  1.    Note  that  the 
computations  are  based  on  an  assumed  mean  of  each  of  the  variables. 
These  are  the  actual  means  that  were  attained  on  each  after  all 
applications  were  received  and  the  overall  means  computed. 


TABLE  1 
Eleven  Variables 
Constituting  the  QIS 


VARIABLE 

AFOQT — Quality  Composite 

SAT  Score 

Cumulative  GPA 

PAS  Rating 

ASTIN  Rating 

AFROTC  GPA 

AFOQT — Quantitative 

Type  Program 

Academic  Major 

Number  of  Applicants 

Applicants  Rank 


MEAN 

WT 

POINTS 

SCORE 

45.5 

0.1381 

6.28 

(  8.4) 

1054.2 

0.0245 

25.83 

(34.5) 

277.3 

0.1005 

27.87 

(37.2) 

3.2 

1.7975 

5.75 

(  7.7) 

3.5 

0.7172 

2.51 

(  3.3) 

224.2 

O.X)130 

2.91 

(  3.9) 

47.0 

0.0459 

2.16 

(  2.9) 

0.6. 

1.5837 

0.95 

(  1.3) 

0.4 

2.5949 

1.04 

(  1.4) 

36.6 

0.0222 

0.81 

(  1.1) 

14.7  ■ 

-0.0870 

-1.28 

(-1.7) 

Quality  Index 

Score: 

74.83 
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The  Officer  Quality  Composite  of  the  Air  Force  Officer  Qualifying 
Test  is  shown  as  contributing  8.4%  to  the  total  score  while  the  SAT 
is  shown  as  contributing  34.5%.    These  figures  merit  additional  quali- 
fication.    Where  students  have  ACT  scores  instead  of  SAT  scores,  the 
scores  are  converted  to  SAT  equivalents.     If  the  students  have  both 
ACT  and  SAT  scores,  they  may  convert  the  ACT  scores  to  SAT  equivalents 
if  the  conversion  results  in  a  higher  score.    Where  students  lack  either 
ACT  or  SAT  scores,  they  are  allowed  to  convert  their  Officer  Quality 
scores  if  they  benefit  .thereby.     Indeed,  they  may  convert  the  Officer 
Quality  Score  in  any  case  where  this  would  be  advantageous  to  them. 
In  addition,  the  AFOQT  Quantitative  score—a  sub-test  of  the  Officer 
Quality  score — is  counted  separately  and  contributes  2.9%.     Thus,  the 
Officer  Quality  Score  may  contribute  much  more  heavily  than  the  figures 
seem  to  indicate.     Standardized  tests  in_toto~ACT/ SAT/ AFOQT~con tri- 
bute 45.8%  of  the  total  Quality  Index  score  with  the  cumulative  GPA  as 
the  next  highest  contributer  at  37.2%.     Standardized  test  scores  and  the 
grade  point  average  in  combination  contribute  83%  of  the  total  Quality 
Index  Score. 

The  Professor  of  Aerospace  Studies  (PAS)  rating  is  done  on  a 
scale  of  0  to  4  and  amounts  on  the  average  to  7.7%  of  the  total  score- 
The  PAS  can  exert  a  little  additional  influence  on  the  overall  score 
through  the  rank-order  of  the  applicant.     The  rank-ordering  is  done 
either  by  the  PAS  or  by  a  local  board  of  which  the  PAS  is  usually  a 
member.     (Note  the  negative  weight  of  the  applicant's  rank  among  those 
ranked) . 

The  "Astin  Rating"  is  a  college  selectivity  rating  on  a  scale  of 
1  to  7  devised  and  published  by  Dr.  Alexander  W.  Astin  in  his  book 
entitled  Predicting  Academic  Performance  in  College. 

The  "Type  Program"  variable  provides  the  applicant  some  credit 
for  participation  in  the  four-year  programs  over  the  two-year  program. 
Its  value  is  either  0  or  1  td,mes  its  weight. 

The  "Academic  Major"  variable  provides  credit  to  those  applicants 
with  desired  scientific/engineering  academic  majors.  Again,  its  value 
is  either  0  or  1  times  its  weight. 

The  Pilot  and  Navigator-Technical  Composites  of  the  AFOQT  are  not 
factors  in  the  QIS.     Nevertheless,  they  are  powerful  as  qualifiers  for 
the  program  since  applicants  are  not  eligible  for  consideration  under 
the  WPSS  as  potential  pilots  or  navigators  unless  they  have  attained 
at  least  the  minimum  requirements  set  by  the  Air  Force  on  these  composit 

Since  standardized  test  scores  in  combination  are  the  most  heavily 
weighted  factor  in  the  system  and  no  statistical  distinction  is  made 
between  SAT,  ACT,  and  Officer  Quality  Scores,  the  correlation  matrix 
in  Table  2  is  of  interest.    The  matrix  also  includes  the  Verbal  and  ^ 
Quantitative  sub-composites  of  the  Officer  Quality  score,  the  applicant 
grade-point-average  (4.00  scale),  and  the  Quality  Index  Score  (QIS) 
derived  by  the  system.    Only  those  applicants  possessing  both  SAT  and 


ACT  scores  were  used.  The  coefficients  to  the  left  were  derived  from 
287  applicants  possessing  all  three  scores  who  applied  in  FY  77.  ihe 
coefficients  to  the  right  (in  parentheses)  were  derived  from  3A1  such 
applicants  in  FY  78. 


TABLE  2 
Correlation  Matrix 
(Pearson  Product-Moment) 


H 

u 
o* 
o 

VERB 

O" 

<; 

CA 
H 

Q> 

ACT 

.85(.83) 

.73(.67) 

.66(.68) 

.58(.57) 

.21(.26) 

.7A(.72) 

SAT 

.85(.83) 

.72(.66) 

.65(.66) 

.58(.58) 

.27(.29) 

.76(.73) 

*OQC 

.73(.67) 

.72(.66) 

.7A(.69) 

.77(.77) 

.21(.26) 

.81(.79) 

VERB 

.66(.68) 

.65(.68) 

.7A(.70) 

.A5(.A2) 

.21(.17) 

.62(.56) 

QUANT 

.58(.57) 

.58(.58) 

.77(.77) 

.AA(.A2) 

.15(.2A) 

.67(.73) 

GPA 

.21(.26) 

.27(.29) 

.21(.26) 

.21(.17) 

.15(.2A) 

.59(.63) 

QIS 

.74(.72) 

.76(.73) 

.81(.79) 

.62(.56) 

.67(.73) 

.59(.63) 

It  is  interesting  to  note  that  the  ACT  and  AFOQC  predict  the  grade- 
point-average  to  an  equal  degree  while  the  SAT~probably  the  most  highly 
and  systematically  standardized  test  in  existence-predicts  it  only  slightly 
better.    Indeed,  one  might  reasonably  contend  that  the  three  tests  are 
about  equally  predictive  of  academic  success  as  measured  by  the  grade- 
point-average  . 

Some  individuals  have  expressed  surprise  and  dismay  at  the  seemingly 
low  correlations  between  standardized  test  scores  (SAT/ACT/OQC)  and  the 
grade-point-average.    All  of  these  tests  purport  to  predict  academic 
success. 

It  must  be  remembered  that  among  students  taking  the  SAT  and  ACT 
many  low  scorers  are  dissuaded  from  going  to  college  and  are  not  present 
to  be  included  in  the  validity  data.    Many  others  for  whom  the  testa 
accurately  predicted  failure  did  not  survive  in  college  long  enough  to 
be'ncluded  in  this  group  of  applicants  for  the  advanced  AFROTC  program. 
Finally,  any  students  for  whom  the  test  inaccurately  predicted  failure 
are  still  present  and  count  against  the  test' s validity.    For  those 
reasons,  at  this  stage  of  the  game- the  end  of  the  sophomore  year- these 

coefficients  should  be  considered  quite  good.  

*The  OQC  scoring  scale  is  restricted  in  range  in  comparison  ^dltrthTAcTan^^ 
SAT  scales.  If  OQC  scores  were  free  to  vary  over  the  same  range  as  their  ^^gW 
and  SAT  counterparts,  their  correlatioiv^w^th  the  other  two  tests  (and  with  the 

GPA)  would  probably  be  higher.  »^  / 
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It  is  also  interesting  to  note  that  the  Air  Force  Officer  Quality 
score  stands  up  well  in  the  company  of  its  highly  prestigeous  competitors. 
Indeed,  any  one  of  the  three  scores  could  readily  be  accepted  as  a  pre- 
dictor of  academic  success  in  lieu  of  either  of  the  other  two. 

***************************** 

For  many  years,  AFROTC  allowed  its  separate  detachments  to  make 
their  selection  for  entry  in  the  Professional  Officer  Course  (POC) 
locally  in  the  same  manner  that  other  college  programs  admit  their 
applicants.    Local  selection  was  allowed  despite  the  knowledge  that  the 
lowest  individual  Officer  Quality  score  at  some  of  the  highly  selective 
institutions  was  higher  than  the  highest  score  at  some  of  the  low 
selectivity  institutions.     However,  the  decline  in  the  number  and 
quality  of  applicants  that  ensued  after  the  advent  of  the  all-volunteer 
force  made  it  apparent  that  AFROTC  would  have  to  exert  strong  quality 
controls  to  insure  a  continued 'rfitfcfi  officer  corps.     Central  selection 
of  applicants  seemed  a  necessary  measure  though  one  that  AFROTC  was 
reluctant  to  tak6 1 

The  new  WPSS  has  proved  to  be  a  more  than  adequate  compromise 
between  local  and  central  selection.    Under  the  new  system.  Air  Force 
Professors  of  Aerospace  Studies  (Detachment  Cammanders)  are  allowed  to 
fill  their  enrollment  quotas  locally  with  students  possessing  Quality 
Index  Scores  of  sixty-three  or  above  prior  to  a  given  cut-off  date. 
The  names  of  those  with  scores  above  £>ixty-three  in  excess  of  quotas  or 
who  apply  after  the  cut-off  date  and  those  with  scores  below  sixty-three 
are  submitted  to  Maxwell  AFB  for  central  selection.     The  names  c  those 
selected  locally  are  also  submitted  for  official  confirmation.  Thus, 
all  or  nearly  all,  of  the  selections  are  still  made  locally  at  the  more 
selective  institutions,  while  selections  at  the  less  selective  insti- 
tutions are  partially  made  by  central  board.    While  in  theory  a  local 
selectee  might  be  thrown  out  in  favor  of  a  more  highly  qualified  central 
selectee,  this  in  fact  did  not  happen. 

The  new  system  allows  AFROTC  to  enjoy  simultaneously  the  best  aspects 
of  both  local  and  central  selection.    Indeed,  it  is  possible  to  make  the 
seemingly  paradoxical  assertion  that  while  all  the  selections  are  made 
centrally,  most  are  still  made  locally.     That  is  to  say  that  all  the 
selections  made  locally  are  those  that  would  have  been  made  by  central 
selection  and  do  not  be  come  final  until  confirmed  by  the  central  board. 

**************************** 

One  of  the  problems  confronting  the  central  selection  boards 
has  been  th'e  high  incidence  of  drop-outs  among  applicants  already 
selected.    About  22%  of  selectees  did  not  subsequently  enroll,  this 
has  kept  the  central  selection  boards  in  action  throughout  the  summer 
months  and  up  to  the  starting  day  of  class  and  even  beyond.    A  great 
deal  of  conjecture  occurred  about  why  this  should  be  so.     One  hypothesis 
was  that  the  drop-outs  were  occurring  among  the  higher  quality  applicants 
who  had  wider  and  better  alternatives  than  their  less  talented  fellows 
and  were  being  distracted  by  offers  from  competitors.    As  reasonable  as 


this  hypothesis  sounded,  It  has  proved  to  be  largely  untrue.  Drop-outs 
are  about  equal  In  quality  to  those  who  remain  as  may  be  seen  from  the 
data  In  Table  3. 


TABLE  3 


Comparison  of  Applicants 

(Applied, 

Selected, 

and  Selected/Dropped) 

*SAT- 

No. 

OQC 

GPA 

Eq 

Qis 

a. 

Applied 

A613 

50.6 

2.79 

1090 

77 

b. 

Selected 

AA70 

52.0 

2.81 

1098 

78 

c. 

Selected 
Dropped 

123A 

50.0 

2.83 

1089 

76 

However,  the  term  "dropped,"  as  employed  here  Includes  all  selectees 
who  for  some  reason  after  selection  failed  to  enter  the  program  when 
classes  began.    The  failure  may  have  been  totally  involuntary  as  would 
be  the  case  with  academic  eliminations  from  the  institution,  medical 
disqualifications,  or  headquarters  disapproval  of  a  request  to  waive  some 
disqualifying  characteristic.     (Arrest,  drug-abuse,  etc.).     Some  drop- 
outs might  be  called  semi-voluntary  such  as  students  who  could  not  gain  A 
entry  in  the  category  desired  (pilot  or  navigator)  or  who  failed  to  re-  " 
ceive  an  anticipated  scholarship.    Other  drop-outs  are  entirely  voluntary 
such  as  those  who  simply  lose  interest,  change  their  minds  about  enrolling, 
or  who  enroll  in  an  Army  or  Navy  program. 

Reasons  for  drop-out  insofar  as  they  could  be  determined  and  the 
various  quality  measures  associated  with  each  are  as  detailed  below: 


TABLE  4 
(Reason  for  Drop-Out) 


SAT- 

No. 

% 

02C 

GPA 

_Efi 

SIS 

a. 

Academic 

100 

8.1% 

51 

2. A3 

1086 

73 

b. 

Physical 

159 

12.9% 

A9 

2.73 

1070 

73 

c. 

Quota  Competi- 

30 

2.A% 

38 

2.69 

1023 

70 

tion 

d. 

HQ  Disqualified 

25 

2.0% 

A6 

2.65 

1088 

7A 

e. 

Outside  Competi- 

216 

17.5% 

5A 

2.93 

1097 

78 

tion 

f. 

Field  Tng  Elim 

87 

7.1% 

51 

2.85 

1089 

75 

g- 

Personal 

582 

A7.2% 

50 

2.91 

1093 

76 

h. 

Scholarship 

11 

.9% 

55 

2.72 

1120 

78 

i. 

Other  +  Unknown 

2A 

1.9% 

52 

2. 68 

1105 

76 

j. 

Overall 

i23A 

100% 

50 

2.83 

1089 

M 

*SAT-EQ  includes  actual  SAT  scores  or  ACT/OQC  conversions  to  SAT.  The 
mean  is  computed  without  distinction  between  actual  and  converted  scores. 
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The  two  largest  groups  of  drop-outs  are  those  citing  "Outside 
Competition"   'andv  ,?h6j&Ta  citing  "?  as  the  reason  for  drop-out. 

These  are  also  the  two  groups  among  which  the  reason  for  drop-out  is 
strictly  voluntary.    Therefore,  these  drop-outs  merit  more  detailed 
examination.    Their  characteristics  are  outlined  in  Tables  5  and  6. 

TABLE  5 

(Drop-Outs — Outside  Competition) 
Received  a  better  offer  from: 


SAT- 

No. 

% 

oqc 

GPA 

QIS 

a. 

Civilian  Source 

140 

64.8% 

54 

2.96 

1103 

78 

b. 

Other  Military 

54 

25.0% 

56 

2.85 

1097 

77 

c. 

Other  Government 

8 

3.7% 

39 

2.94 

1034 

74 

Agency 

d. 

Unknown /Other 

14 

6.5% 

48 

3.00 

1070 

76 

Total 

216 

100% 

54 

2.93 

1097 

78 

TABLE  6 


(Drop-Outs — Personal 

Reasons) 

SAT- 

No. 

% 

OQC 

GPA 

EQ 

qis 

a. 

Lost  Interest 

360 

61.8% 

50 

2.91 

1098 

77 

b. 

Peer  Pressure 

4 

.7% 

35 

3.23 

1110 

79 

c. 

Family  Problems 

62 

10.7% 

43 

2.90 

1049 

74 

d. 

Financial  Problems  41 

7.0% 

45 

2.88 

1066 

73 

e. 

Active  Duty 

9 

1.5% 

47 

2.73 

1059 

75 

(rot  Released) 

f. 

Girl/Boy  Friend 

23 

4.0% 

42 

2.89 

1083 

75 

g- 

Religion 

7 

1.2% 

58 

2.88 

1124 

81 

h. 

Unknown 

29 

5.0% 

60 

2.85 

1118 

77 

i. 

Other 

47 

8.1% 

58 

2.94 

1126 

80 

Total 

582 

100% 

50 

2.91 

1093 

76 

The  figures  in  Table  5  by  no  means  define  AFROTC's  problems  with  compe- 
tition from  outside  agencies.    The  loss  of  140  prime  selectees  to  civilian 
competitors  is  regrettable.    What  we  do  not  know  is  how  many  were  lost 
before  they  ever  applied  for  selection. 

What  does  become  apparent  under  analysis  of  the  data  is  that  360  high 
quality  selectees  dropped-out  simply  because  they  "lost  interest"  between 
the  time  of  selection  and  the  first  day  of  class.     If  the  individual 
detachments,  by  a  vigorous  follow-up  campaign,  could  succeed  in  reducing 
this  figure  by  half,  they  might  succeed  in  reducing  the  over  all  drop-rate 
from  28%  to  24%.    Getting  the  rate  much  lower  than  24%  would  not  seem  a 
realistic  goal. 


Characteristics  of  the  FY  78  WPSS  selectees  (for  FY  80  graduation) 
are  as  displayed  in  Tables  7  and  8.    Table  7  shows  the  mean    scores  by 
sex  and  race  of  fall  78  enrollees  on  the  Officer  Quality  Composite  and 
SSntUative  Composite  of  th.  Air  Force  Officer  Qualifying  Test  and  their 
2^n  gia5e-point!averages  on  the  four-point  scale.    Table  8  showa^the  same 
information  by  enrollment  category:    pilot,  navigator,  missile  specialist, 
science-technical,  and  other. 

TABLE  7 

(Mean  Standardized  Test  Scores  and  Grades  of  WPSS  Selectees 
for  FY  78.  by  Sex  and  Race) 


i 


Total  "N" 

SAT 

OQC 

QUANT 

CPA 


Overall 

Male 

Female 

Caucasian 

3211 

2691 

520 

2770 

1099 

1105 

1065 

1119 

52 

54 

44 

56 

55 

57 

46 

58 

2.80 

2.77 

2.91 

2.81 

Black  Other 


330 
949 
26 
33 
2.69 


111 
1032 
36 
46 
2.76 


TABLE  8 


Total  "N" 

SAT 

OQC 

QUANT 

CPA 


i 


by  Category) 

Tech/ 

Non- 

Overall 

Pilot 

Navigator 

Missile 

Science 

Tech 

3211 
1099 
52 
55 
2.80 

848 
1141 
61 
65 
2.84 

285 
1092 
50 
56 
2.58 

298 
1064 
47 
48 
2.68 

902 
1152 
61 
68 
2.90 

878 
1017 
37 
36 
2.76 

Table  9  shows  the  improvement  in  Officer  Quality  Score  since  the  pre- 
WPSS  days  in  fiscal  year  '67. 


a.  FY  76  (Pre-WPSS) 

b.  FY  77  (1st  Yr  WPSS) 

c.  FY  78  (2nd  Yr  WPSS) 


TABLE  9 
OQC 
41 
49 
52 


SAT 

1087 
1099 


i 
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AFROTC  is  justified  in   concluding   that  the  system  has 
helped  identify  quality  applicants   and  allowed  AFROTC  to 
select    the  highest  quality   applicants.     AFROTC   intends  to 
continue   to  use,   study,   and  refine   the  system  for   the  fore- 
seeable future. 
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ABSTRACT 


The  Defense  Language  Aptitude  Battery  (DLAB) 
was  introduced  in  1977  for  use  .by  the  Defense 
Language  Institute  Foreign  Language  Center 
to  screen  potential  candidates  for  training 
in  over  thirty  foreign  languages-     Its  pre- 
dictive validity  for  success  in  foreign  lan- 
guage training  is  higher  than  its  predecessor 
test  and  two  commercially- available  language 
aptitude  tests ^     Differential  prediction  by 
language  was  studied  as  a  part  of  the  vali- 
dation research.     That  hypothesis,  when  using 
DLAB,  was  not  sustained. 
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THE  DEFENSE  LANGUAGE  APTITUDE  BATTERY 


The  Defense  Language  Institute  Foreign  Language  Center 
is  located  at  the  Presidio  of  Monterey,  California,  and 
operates  under  a  direct  charter  from  the  Department  of 
Defense  which  names  the  Department  of  the  Army  as 
Executive  Agent  for  operation  of  the  school. 

At  Monterey  some  thirty  foreign  languages  (expandable  to 
fifty  languages)  are  offered  to  a  group  of  some  2,200 
students  at  any  given  time.    We  graduate  between  three 
and  four  thousand  students  annually.    Our  students  are 
predominantly  officer  and  enlisted  personnel  from  the 
four  military  branches  plus  a  smattering  of  civilian 
students  from  other  federal  agencies.    Spouses  of  stu- 
dents are  also  invited  to  attend  class  on  a  space- 


Like  most  military  schools,  the  Institute  is  concerned 
with  cost-effective  operations  while  producing  the  best- 
qualified  linguists  possible.    One  method  employed  is  to 
attempt  to  predict,  and  therefore  control,  student  attri- 
tion for  academic  reasons.    The  Defense  Language  Aptitude 
Battery  (DLAB)  is  used  for  this  purpose. 

Each  military  branch  has  its  own  recruiting  criteria  for 
physical  and  mental  standards.    Whether  the  individual 
is  a  first-term  service  man  or  woman,  or  someone  with  a 
number  of  years  of  military  service,  candidates  for 
foreign  language  training  at  Monterey  are  required  to 
take  the  DLAB.    General  and  flag  officers  are  excused 
from  this  requirement. 

Exercising  his  authority  over  technical  co^^^rol  of  the 
Defense  Foreign  Language  Program,  the  Commandant,  DLIFLC, 
sets  the  minimum  scoring  criteria  (cutting  score)  on  DLAB 
that  represents  eligibility  for  training.    Waivers  may  be 
granted  at  the  discretion  of  the  Commandant.    The  test  is 
Ssually  administered  at  Armed  Forces  Entrance  and  Exami- 
nation Stations,  or  at  Lackland  Air  Force  Base,  Texas. 
Some  testing  is  done  at  Monterey. 


The  views  of  the  author  do  not  purport  to  reflect  the  official 
position  of  the  Department  of  Defense,  the  United  States  Anny, 
or  the  Defense  Foreign  Language  Center. 
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The  Defense  Language  Aptitude  Battery  is  the  successor  to 
the  Defense  Language  Aptitude  Test  (DLAT) ,  which  was  used 
for  over  twenty  years.     DLAB  was  implemented  in  the  summer 
of  1977  and  the  use  of  DLAT  rescinded  at  that  time.  The 
two  tests  differ  in  several  significant  ways. 

While  both  tests  are  paper  and  pencil  tests  using  a 
multiple-choice  format,  DLAB  also  contains  an  audio  compo- 
nent.   DLAT  did  not.    This  was  incorporated  into  the  test 
design  because  of  the  teaching  methodology  used  at  Monterey. 
This  is  predominantly  the  audio-lingual  method,  which  places 
a  considerable  burden  on  listening,  as  opposed  to  cognitive- 
code,  grammar-translation  and  other  traditional  foreign 
language  teaching  methodologies. 

The  old  DLAT  was  prepared  in  two  alternate  forms.    The  con- 
struction and  equating  of  alternate  test  forms  is  an  expen- 
sive and  time-consuming  operation.    The  purpose  of  con-_ 
struct'ing  alternate  forms"  is  to  mitigate  the  problems  of 
compromise  and  practice  effect  when  personnel  are  retested. 
Experience  on  DLAT  indicated  that  few  individuals  ever 
requested  a  second  test  administration.    Further,  the  unique 
design  of  DLAB  is  such  that  compromise  short  of  possessing 
the  answer  key  would  be  difficult.    As  an  additional  safe- 
guard, test  length  was  extended  from  fifty-nine  items  on 
DLAT,  to  119  items  on  DLAB. 

DLAT  required  about  thirty  minutes  to  administer  and  DLAB 
requires  about  ninety  minutes.     This  has  caused  some  diffi- 
culty at  the  AFEES,  where  each  processing  minute  is  very 
important.    We  are  now  conducting  item  analysis  on  a  sample 
of  approximately  2,000  answer  sheets  to  investigate  the 
possibility  of  reducing  test  length  without  disturbing 
validity  or  reliability. 

DLAB  enjoys  one  great  advantage  over  DLAT.    That  is,  the 
meticulous  field  validation  of  the  test,  its  revisior  and 
subsequent  cross-validation  using  external  criteria  oti  a 
large  population  of  our  students.    The  result  is  that  DLAB 
has  a  correlation  with  student  achievement  of  .50,  as 
opposed  to  .35  for  DLAT. 

The  size  of  the  population  taking  the  test  at  AFEES  and 
military  bases  worldwide  is  known  to  us,  but  fluctuations 
in  the  size  of  the  population  are  dependent  upon  variables 
not  under  our  control.    With  known  "pass  rates"  associated 
with  a  given  cutting  score  we  can  establish  a  fair  idea 
of  the  student  eligibility  pool  for  that  cutting  score. 
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An  abrupt  decrease  in  the  population  passing  the  test  with- 
out a  corresponding  decrease  in  linguist  requirements  would 
suggest  that  the  cutting  score  be  lowered.    While  this 
would  increase  the  relative  eligibility  pool  it  would  also 
be  aeeompanled  by  a  rise  in  academic  attrition.    Thus,  we 
attempt  t©  peg  the  cutting  score  at  a  point  that  will  permit 
the  appropriate  number  of  individuals  to  become  eligible 
while  maintaining  the  lowest  possible  predicted  academic 
attrition  rate. 

Periodically,  DLAB  scores  are  compared  with  classroom  per- 
formgmce  by  our  students  using  the  final  course  grade  as 
the  criterion.    Combined  with  input  numbers  and  attrition 
data,  we  can  then  establish  an  optimum  cutting  score.  Based 
upon  simulated  prediction,  the  current  raw  cutting  score  of 
sixty  produces  an  eligibility  pool  of  approximately  twenty 
per  cent  of  those  actually  tested.     In  actual  performance 
the  eligibility  yield  has  been  slightly  higher,  between 
twenty  three  and" twenty  four  per  cent.    We  suspect  that 
this  small  fluctuation  is  due  to  changes  in  the  total  test 
population.     The  groups  upon  whom  the  test  was  normed  back 
in  the  early  nineteen  seventies  was  almost  exclusively  male 
and  predominantly  white.     In  recent  years  the  recruit  popu- 
lation has  included  growing  nximbers  of  females  and  blacks. 
Earlier  this  year,  at  the  request  of  the  Defense  Department, 
we  performed  a  preliminary  study  on  the  limited  number  of 
answer  sheets  then  on  hand  to  determine  if  there  were  sig- 
nificant differences  in  the  test  population  in  terms  of 
military  branch,  ethnic  origin  and  gender.    That  informa- 
tion did  indicate  differences.     In  general,  DLAB  scores 
were  slightly  higher  for  Navy  personnel,  whites  and  females. 
The  first  full  cycle  of  these  individuals  are  now  complet- 
ing their  foreign  language  training.    We  are  keenly  aware 
of  the  social,  legal  and  cost-effective  operational  impli- 
cations of  these  preliminary  findings.    When  adequate  num- 
bers of  personnel  have  completed  the  training  cycle  we  will 
be  obligated  to  investigate  these  variables  further. 

AS  a  selection,  classification  and  screening  device,  DLAJ 
does  not  operate  in  a  vacuum.    There  are  many  other  related 
factors.    For  instance,  the  military  branches  may  Impose 
minimum  attainment  standards  on  the  Armed  Services  Voca- 
tional Aptitude  Battery  (ASVAB)  before  individuals  are  per- 
mitted to  take  DLAB.    Auditory  acuity  is  only  grossly 
examined  nov^.    We  would  like  to  improve  the  way  this  is 
measured.    Our  students  are,  by  and  large,  volunteers. 
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This  surely  impacts  on  motivation  and  attitudes.    Many  of 
our  graduates  must  enter  fields  of  work  with  sensitive 
security  requirements.    Therefore,  their  backgrounds  must 
indicate  a  high  probability  of  being  granted  a  security 
clearance  before  a  lot  of  money  is  invested  in  their  train- 
ing.   Other  than  aptitude  and  learning  capacity,  these 
variables  are  not  subject  to  the  control  of  DLIFLC. 

One  other  variability  interested  us  when  the  test  was 
designed.    We  hoped  that  the  special  features  of  the  test 
might  permit  some  indication  of  differential  aptitude  across 
language  families  or,  perhaps,  for  individual  languages 
themselves.     This  proved  to  be  a  phantom,  probably  due  to 
a  series  of  uncontrolled,  or  even  unrecognized,  variables. 
For  example,  American  English  is  the  native  language  for 
most  of  our  students.    Based  upon  both  lengthy  experience 
and  intuition,  we  generally  expect  the  Romance  languages 
to  be  easiest  for  our  students  to  learn,  the^Slavic^ group 
somewhat  more  dirricuit,  and  the  Arabic  and  Oriental  Ian 
guages  the  most  difficult.    Unfortunately,  there  are 
numerous  linguistic  differences  within  these  generalities 
that  may  enhance  or  impede  the  learning  process  for  the 
native  American  English  speaker.    To  cite  but  two  examples, 
our  students  have  considerable  problems  with  tone  languages 
(e.g..  Thai,  Chinese)  and  those  with  unique  writing  systems 
(e.g.,  Arabic,  Japanese).     And,  despite  a  commitment  to 
rather  singular  teaching  methodology  and  environment,  eacn 
course  of  instruction  differs  in  many  ways  from  the  others 
in  areas  not  related  to  the  language  itself. 

The  initial  establishment  of  the  cutting  score  for  DLAB  was 
arbitrary,  but  not  set  without  considerable  information. 
My,  colleague,  Mr.  Thain,  will  discuss  the  techniques 
employed  as  well  as  the  rationale  for  using  standard  scores 
for  DLAB,  instead  of  raw  scores,  which  were  used  with  DLAT. 

For  those  of  you  who  would  like  to  further  examine  both  the 
DLAB  design  concept  and  the  validation  procedures,  we  have 
a  limited  number  of  booklets  here  at  the  front  table  con- 
taining the  technical  reports. 

I  thank  you  for  your  attention  and  will  be  happy  to  respond 
to  any  questions  you  may  have. 
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DEFENSE  LANGUAGE  APTITUDE  BATTERY  (DLAB) 
DESCRIPTION 

PART  I    -  BIOGRAPHICAL  INVENTORY 

PART  II  -  RECOGNITION  OF  STRESS  PAHERNS 

PART  1 1 1  -  FOREIGN  LANGUAGE  (FL)  GRAMMAR 

PART  IV  -  FL  CONCEPT  FORMATION 

ACTUAL  TEST 

Items  119 

Practice  Items   7_ 

Total  Items  126 

CUT  OFF  SCORES  FOR  ENTRY 

RAW  STANDARD 
60  89 
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DEFENSE  LANGUAGE  APTITUDE  BAHERY 
PREDICTIVE  VALIDITIES  FOR  INDIVIDUAL  LANGUAGE  COURSES 


ZERO  ORDER  CORRELATIONS 
DLAB  PREDICTIVE         DLAT  PREDICTIVE 


LANGUAGE 

N 

V/AI  in  ITV 
V  nui  u  1  1  1 

r 

hi 

N 

VALIDITY 
r 

ARABIC 

153 

.400 

140 

.210 

CHINESE-MANDARIN 

85 

.624 

75 

.244 

CZECH 

86 

.640 

70 

.503 

rKtNun 

It. 

43 

311 

GERMAN 

106 

.428 

99 

.228 

KOREAN 

92 

.547 

78 

.467 

RUSSIAN 

86 

.678 

73 

.570 

SPANISH 

83 

.594 

66 

.507 

THAI 

51 

.521 

27 

.3^98 

VIETNAMESE 

38 

.433 

37 

.040 

TOTALS 

852 

.541 

708 

.348 
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COMPLfTER  SIMULATION  OF 
^DLI  SELECTION  PROBLEM 


DLAB 

BASE 

CUTTING 

SAMPLE 

AVERAGE 

SCORE 

INELIGIBLE 

ATTRITION 

GRADE 

% 

% 

% 

40 

42.2 

17.8 

78.7 

42 

46.5 

16.7 

79. ; 

44 

50.7 

15.3 

79.6 

46 

54.6 

14.0 

80.0 

48 

CO  0 

yow  o 

!2.7 

80,5 

50 

62.9 

IL5 

8L0 

52 

66.9 

10.5 

8L4 

54 

70.6 

10.0 

8L9 

56 

740 

8.6 

82.3 

58 

77.2 

7.7 

82.8 

60 

80.3 

7.1 

83.3 

61 

8K7 

6.7 

83.6 

62 

83. 1 

6. 1 

«3. 9 

63 

84.4 

5.6 

84.2 

64 

85.5 

5.2 

844 

65 

86.7 

4.8 

847 

66 

87.9 

4.4 

85.0 

67 

88.9 

4.0 

85.2 

68 

90.0 

3.9 

85.5 

69 

90.1 

3.5 

85.9 
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DEFENSE  LANGUAGE  INSTITUTE 
FOREIGN  LANGUAGE  CENTER 


DEFENSE  LANGUAGE  APTITUDE  BATTERY 
Current  (April  1978)  Statistics 


Total  Test  Population 
Army  Sample 
Air  Force  Sample 
Navy  Sample 
Marine  Sample 

♦Maximum  Possible  Score  =  119 


Standard 

N 

Mean* 

Deviation 

24, 633 

5L6 

15.5 

870 

52.0 

17.2 

1,560 

5L3 

14.4 

333 

55.2 

17.8 

(Too  Small  to  Include) 


Test  Reliability  Estimate  (N=4,000)  .  91 
(Kuder-Richardson  Formula  21) 


Passing  Rate 
(Raw  ScoreCut-Off  =  60) 


Test  Population 

All  Services 
Army 
Air  Force 
Navy 
Marines 


Number  of 

Number 

Eligible 

Answer  Sheets 

Pass 

Per  Cent 

76.837 

14,534 

24.9% 

43,428 

6,686 

15. 4% 

33,085 

7,744 

23.4% 

310 

100 

32.2% 

14 

4 

28.6% 
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Correlations  of  Predictors  With  Average  Grade 
For  1969  -  1971  Sample  (879  Cases)  in  12  Languages 


Predictor  Correlation 

1  Age  .029 

2  Years  of  Education  -'85 

3  Defense  Language  Aptitude  Test  (DLAT)  .373 
Modern  Language  Aptitude  Test  (MLAT): 

4  MLAT  -  Part  I :  Number  Learning  •  155 

5  MLAT -Part 2:  Phonetic  Script  .307 

6  MLAT -Part  3:  Spelling  Clues  .324 

7  MLAT -Part 4:  Words  in  Sentences  .359 

8  MLAT -Part  5:  Paired  Associates  .243 

9  MLAT  -  Auditory  (Total  4  &  5)  .266 

10  MLAT -Paper  and  Pencil  (Total  6.  7,  &  8)  .413 

• MLAT -Total  .^1 
Pimsleur  Language  Aptitude  Battery  (PLAB): 

12  PLAB -Part  I:  Past  Grades  (Biographical)  .258 

13  PLAB  -  Part 2:  Interest  .113 

14  PLAB -Part 3:  Vocabulary  .267 

15  PLAB -Part  4:  Language  Analysis  .263 

16  PLAB -Part 5:  Sound  Discrimination  .198 

17  ^LAB  -  Part  6:  Sound  Symbol  Assoc.  .204 

18  PLAB  -  Auditory  (Total  16  &  17)  .253 

19  PLAB -Paper  and  Pencil  (Total  14  &  15)  .319 

20  PLAB  -  Linguistic  (Total  18  &  19)  .357 

21  PLAB  -  Total        ,  .405 

22  Otis-Lennon  (IQ)  .272 

23  Need  for  Social  Approval  Scale  -.078 

24  Taylor  Manifest  Anxiety  Scale  -.040 

25  Defense  Language  Aptitude  Battery  .431 
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SELECTION  DECISIONS  FROM  PERSONNEL  TESTS 


Mr.  John  W.  Thain 
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Evaluation  Specialist 


a  paper  presented  to  the 
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October  1978 
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ABSTRACT 


From  a  strictly  theoretical  point  of  view,  the  best  method 
for  determining  the  cutoff  score  for  an  aptitude  test  is 
to  administer  the  test  during  a  field  trial,  but  randomly 
select  personnel  for  training  regardless  of  score  on  the 
aptitude  test,  and  then  observe  performance  and  attrition 
rates  in  terms  of  aptitude  test  score.     However,  such  a 
method  is  often  impractical  in  a  military  setting.  Monte 
Carlo  techniques  can  be  used  to  simulaiie  random  selection 
from  the  "unrestricted"  population  to  which  the  aptitude 
test  is  administered.     A  computer  program  designed  by  the 
author  uses  test  and  criterion  parameters  and  cutting 
scores,  correlation  coefficient,  sample  size,  and  number 
of  samples  to  be  drawn  as  inputs,  and  calculates  decision 
classification  rates  across  samples  and  for  combined 
samples. 
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MONTE  CARLO  COMPUTER  PROGRAMS  FOR  SIMULATING 
SELECTION  DECISIONS  FROM  PERSONNEL  TESTS 


My  colleague  at  DLIFLC  in  Monterey,  Mr.  Henderson,  has 
presented  a  paper  on  the  development  of  the  Defense  Language 
Aptitude  Battery  (DLAB) .    My  work  on  a  Monte  Carlo  program 
for  simulating  personnel  test  decisions  was  done  in  conjunc- 
tion with  the  development  of  this  language  aptitude  test. 
The  Monte  Carlo  program  has  broader  application  for  all 

 1  J  i.-   <-wn  .  nr>*-  CUE  language 

aptitude  test.    However,  our  work  on  DLAB  provides  concrete 
illustrations  of  how  the  program  can  be  used.     I  will  start 
by  explaining  some  relatively  basic  concepts  and  terms  and 
build  up  to  more  complicated  ideas. 

Since  I  am  not  sure  about  the  background  of  the^ 
audience,  I  am  not  sure  how  to  sequence  the  Presentation  ot 
these  basic  concepts.     I  hope  the  main  ideas  eventually  get 
through  and  that  no  one  feels  distressed  because  he  has 
difficulty  in  following  the  transitions  from  point  to  point. 

The  table  at  Appendix  A  illustrates  a  basic  problem  in 
making  personnel  decisions.    No  language  aptitude  test  is 
perfect  enough  to  insure  that  everyone  scoring  higher  than  a 
certain  score  will  succeed  in  language  training  while  every- 
one scoring  lower  will  fail.     In  practice,  some  examinees 
pass  the  test  and  then  fail  the  training.    Other  examinees 
fail  the  test,  but  could  have  passed  the  training  if  they 
had  been  given  the  chance. 

If  the  minimum  passing  score  is  extremely  low,  then  we 
will  have  a  preponderance  of  problems  °^^t^e  first  kind  -- 
examinees  passing  the  test  but  subsequently  failing  train 
ing,  but  very  few  problems  with  screening  out  nonpassing 
examinees  who  could  have  succeeded  in  training.     If  tne 
cutting  score  is  very  high,  we  will  have  only  minor  pro 
blems  with  test  passers  subsequently  failing  training  but 
more  significant  problems  with  the  unwanted  screening  out 
of  nonpassers  who  failed  to  achieve  the  high  passing  score 
but  who  could  have  succeeded  in  training. 

Let  us  call  the  first  type  of  error  (where  test 
passers  fail  in  training)  false  positive  errors,  and  the 
second  type  of  error  (where  examinees  are  screened  out  who 
could  have  succeeded  in  training)  false  negative  errors. 
As  we  can  see,  the  proportion  of  false  positives  will 
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increase  as  the  cutoff  score  falls,  and  the  proportion  of 
false  negatives  will  increase  as  the  cutting  score  rises.  By 
choosing  a  given  cutting  score  we  choose  a  certain  tradeoff 
between  false  positives  and  false  negatives. 

Let  us  assume  a  simple  scenario  involving  a  predictor 
test  and  a  criterion  test.     All  examinees  taking  the  pre- 
dictor test  also  take  the  criterion  test.     We  want  to 
generalize  these  results  to  a  future  situation  in  which  we 
have  established  cutoff  scores  for  the  predictor  test  and  a 
cutoff  score  for  mastery  on  the  criterion  test= 

Let  us  make  some  assumptions  that  will  allow  us  to 
conduct  a  simulation  study.     Assume  that  we  know  the  popula- 
tion parameters  listed  on  the  slide  —  mean  and  standard 
deviation  of  predictor  and  criterion  tests  and  the  correla- 
tion between  predictor  and  criterion.    Assume  a  bivariate 
nonnal  distribution.     Our  computer  program  can  easily 
utilize  a  random  number  generator  and  then  an  inverse  normal 
distribution  function  to  generate  a  normal  distribution  of 
mean  x  and  standard  deviation  of  a  correspond- 

ing set  of  random  numbers  can  be  generated  and  subjected  to 
the  inverse  normal  distribution  function  and  then  plugged 
into  the  formula  in  Appendix  B. 

This  formula  will  generate  a  criterion  distribution  with 
a  mean  of  yCf^^i  ,  standard  deviation  of  (T-^f  and  correlation 
of  with "the  predictor.     If  z  scores  are  used  throughout 

the  computation,  the  formula  is  much  simpler.     It  was  also 
simpler  to  write  the  computer  program  using  z  scores  and 
convert  to  raw  scores  only  when  output  was  required. 

So  we  now  have  a  bivariate  normal  distribution  with 
^ given  parameters.    At  least  we  almost  have  such  a  distribu- 
tion.    No  system  for  generating  random  numbers  is  totally 
random.     The  random  number  functions  I  have  used  have  a  very 
slight  bias.     The  more  pairs  of  random  numbers  generated,  up 
to  a  certain  point,  the  closer  the  distribution  generated ^ 
approaches  population  parameters.     Since  we  are  dealing  with 
a  computer  simulation,  we  can  easily  generate  50,000  or  more 
pairs  of  random  numbers,  and  the  amount  of  bias  is  extremely 
small . 

Appendix  C  is  an  output  from  a  computer  program  I  have 
written.     Note  the  parameter  values  of  43.6,  69.3,  18.0, 
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14  6,  and  .63,  and  the  actually  obtained  values  of  43.4969, 
69:272lS,  i?. 98392,  14.56758,  and  .63320.  Corresponding 
s  values  are  very  close  to  a  mean  of  0  and  standard 
deviation  of  1  as  we  would  expect:    -.00573,  -.uuioo, 

99911    and  .99778.    By  using  the  random  number  function  ana 
ihe  lnve?Se  Aomal  distribution  function,  we  come    ery  close 
to  a  bivariate  normal  distribution  with  the  desirea 
parameters . 

The  next  step  in  the  simulation  P^ocess^is Jo^add^the 
considerations  of  predictor  cutting  score  an«  c-. 
cStting  score  we  mentioned  earlier.    Every  P^f^f^or  score 
is  matched  to  a  criterion  score.    We  mentioned  earlier  two 
ind^i?able  combinations  of  predictor  and  "^^erion  scores, 
Shich  we  labeled  false  positives  and  f al^^.;;!?f^i^^f ' 
other  desirable  combinations  exist  -  combinations  in  which 
the  aptitude  test  is  doing  what  it  is  supposed  to  do.  One 
combination  is  when  the  examinee  passes  the  -f--  J?; 
passes  his  training;  we  call  these  cases  valid  P°^itives. 
Lother  combination  is  that  the  test^and 
would  have  failed  the  training  axso,  we  t.<xxj. 
valid  negatives. 

Given  our  bivariate  distribution  and  our  predictor  and 
criterion  cutoffs,  every  pair  of  Predictor  and  criterion 

scores  falls  into  one  of  these  four  ^^^^^^"J^lb^^f  °  it 
computer  program  generates  its  bivariate  distribution,  it 
s?mS?taneously  counts  the  number  of  cases  in  each  of  the 
four  categories,  given  the  cutoffs  specified. 

The  results  are  shown  in  the  ?o'^P"t®^,PJ^''^?''|n*i  s 
Appendix  D.     in  this  case         Predictor  cutoff  of  60  is 
alLst  a  full  standard  deviation  ^bove  the  predictor  mean 
of  43.6.    The  computer  program  generated  25,000  cases, 
4502  of  which  passed  the  Predictor  test  and  20,498  of  which 

failed  the  predictor  test.    As  J^^Pl^^^f  positives 
"passes"  are  further  divided  into  val.^d  and  f^lse  positives 
and  the  "fails"  into  valid  and  false  negatives,  depending 
oS  whether'iil  criterion  -toff . score  was  achieved  In 
this  printout,  for  a  given  predictor  and  °^toff  score  we 
have  the  number  of  positive  and  negative  cases  and  the 
nS^er  of  cases  in  Lch  quadrant.    We  also  have  the  mean 
predictor  and  criterion  score  in  each  of  these  categories. 

If  the  distribution  stays  constant  js  the  predictor 
cutoff  rises,  fewer  people  pass.     Some  of  the  correctly 
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classified  valid  positives  that  were  below  the  cutoff  become 
misclassif ied  as  false  negatives.    However,  some  of  the  false 
positives  that  were  below  the  cutoff  become  properly  classi- 
fied as  valid  negatives.     This  is  the  kind  of  tradeoff  we 
nientioned  earlier. 

It  is  interesting  to  note  that  as  the  predictor  cutoff 
rises  in  this  example,  the  mean  scores  for  both  predictor  and 
criterion  in  all  quadrants  rise.     This  occurs  because  there 
is  a  relatively  high  correlation  between  predictor  and 
criterion.     Increasing  the  predictor  cutoff  adds  cases  with 
a  higher  predictor  score  to  the  category  of  negative  cases, 
so  that  both  the  predictor  anc  criterion  means  for  negative 
cases  rise.     The  same  cases  that  are  added  to  the  negative 
categories  were  the  lowest  predictor  scores  from  the  positive 
categories  so  that  the  predictor  and  criterion  means  for  the 
positive  categories  also  rise  for  the  remaining  cases  with 
higher  predictor  scores.     In  parentheses  on  the  printout  we 
also  have  the  percentage  of  positive  and  negative  cases 
passing  the  criterion  for  each  of  the  predictor  cutoffs. 

We  have  generated  25,000  cases  so  that  the  bivariate 
distribution  would  have  almost  exactly  the  parameters  desir- 
ed.    However,  at  DLIFLC  we  don't  admit  students  to  training 
in  groups  of  25,000.     Our  computer  program  enables  us  to 
break  the  big  sample  into  a  number  of  smaller  samples  of 
any  size  we  choose.     This  enables  us  to  view  the  effect  of^ 
sampling  error  in  circumstances  similar  to  everyday  operating 
conditions.     In  our  case  we  have  drawn  250  samples  of  100 
each  from  the  25,000  cases.     To  establish  a  frame  of  refer- 
ence for  thinking  about  sampling  error,  we  have  printed  the 
standard  errors  of  the  parameters  for  2  5,000  cases  and  for 
100  cases  in  the  first  two  columns  at  the  top  and  the  left 
of  the  output  page  in  Appendix  C. 

At  the  lower  left  of  the  page  we  have  the  mean  values 
of  the  small  sample  characteristics  across  groups.  Of 
course  the  mean  of  the  means  is  the  same  as  the  grand  mean, 
but  the  mean  of  the  other  characteristics  varies  slightly 
from  the  values  for  the  whole  sample.     On  the  lower  right 
we  have  the  standard  deviation  of  these  characteristics 
across  groups.     For  example,  with  this  predictor  grand  mean 
and  this  standard  deviation  of  predictor  sample  moans,  we 
would  expect  to  find  68%  of  the  sample  means  to  fall  between 
about  41.7  and  45.3. 
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Finally,  for  each  combination  of  predictor  and 
criterion  cutoff  scores  there  is  an  output  page  like 
Appendix  E.    This  pago  gives  the  average  number  of  cases 
in  each  of  the  small  samples  for  each  of  the  four 
categories  and  also  the  standard  deviation  of  the  number 
of  cases  across  all  the  small  samples.    For  purposes  o^ 
comparison,  a  page  is  shown  where  the  predictor  cutoff 
has  been  raised  from  60  to  68. 

In  the  preceding  example,  we  have  only  talked  about 
changing  the" predictor  cutoff.     It  is  just  as  feasible  to 
change  the  criterion  cutoff,  and  such  examples  were  not 
shown  in  order  to  keep  the  presentation  simple. 

In  summary,  this  program  employs  a  Monte  Carlo 
technique  to  generate  a  bivariate  normal  density  function. 
ThA  five  oarameters  on  which  the  function  is  based  are  tne 
predictor' and  criterion  means,  the  standard  deviations,  and 
the  correlation  coefficient.    The  program  treats  these  para- 
meters as  population  values  from  which  repeated  samples  are 
drawn.     Individual  cases  are  then  compared  to  predictor  and 
criterion  cutting  scores;  the  rates  and  distributions  of 
valid  positives,  false  positives,  valid  negatives,  and  false 
negatives  are  then  computed. 

The  predictor  and  criterion  cutting  scores  can  be 
automatically  incremented  to  produce  expectancy  tables.  Tne 
proqram  utilizes  a  random  number  generator  as  input  to  an 
inverse  normal  function  taken  from  fTATPAK  (Computer 
sciences  Corporation,  1972)  to  create  the 

distributions.     It  has  been  adapted  f-r  a  UNIVAC  1108  with  a 
Fortran  V  Compiler. 

The  following  output  is  generated: 

1.  standard  errors  of  parameters. 

2.  Obtained  means,  standard  deviations,  and  correlation 
coefficient  for  combined  sample. 

3.  Standard  deviation  of  means,  standard  deviations, 
and  correlation  coefficients  across  samples. 
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4.  As  relates  to  the  four  decision  categories  (VP,  FPr 
VN,  FN) : 

a.  Average  niamber  of  cases  in  combined  samples. 

b.  Average  niunber  of  cases  in  each  sample. 

c.  Average  percentage  of  cases. 

d.  Standard  deviation  of  number  of  cases  across 

samples. 

e.  Predictor  mean  score. 

f.  Criterion  mean  score. 

5.  Proportion  of  successful  selectees. 

6.  Mean  predictor  score  for  selectees. 

7.  Mean  criterion  score  for  selectees. 

8.  Proportion  of  successful  rejections  (assuming 
rejections  were  given  the  opportunity  to  succeed) . 

9.  Mean  predictor  score  for  rejections. 
10.    Mean  criterion  score  for  rejections. 
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APPENDIX  A 
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PAM:   A  METHODOLOGY  FOR  PREDICTING 
Am  FORCE  PERSONNEL  AVAILABILITY 


ABSTRACT 


This  paper  describes  a  methodology  for  projecting  the 
career  transition  activity  of  Air  Force  personnel  to  predict 
their  future  availability.  It  includes  a  personnel  availability  an- 
alysis model  (PAM),  application  techniques,  and  a  personnel 
data  bank. 

The  cost  significance  of  weapon  system  personnel  re- 
quirements has  made  their  consideration  a  major  concern  with- 
in the  systems  acquisition  process.  The  need  for  tools  to  aid 
in  this  consideration  has  led  primarily  to  the  development  of 
models  and  techniques  which  address  the  identification  of  those 
requiremeniSo  What  has  been  lacking  is  a  means  to  determine 
and  provide  guidance  for  the  accommodation  of  the  potential  im- 
pacts of  a  changing  military  force  structure  on  their  fulfillment. 
The  methodology  described  here  is  a  first  step  toward  the  com- 
prehensive assessment  of  weapon  system  design,  personnel  re- 
quirements, and  support  plans  in  terms  of  the  future  availability 
of  military  personnel. 

The  heart  of  the  PAM  methodology  is  a  computerized 
model  which  represents  career  transition  activity  within  the  Air 
Force  by  a  series  of  Markov  processes,  each  depicting  a  sub- 
population  of  airmen,  with  states  defined  by  years  of  service 
and  paygrade.  State  transition  probabilities  are  calculated  on  the 
basis  of  actual  transition  activity  data  contained  in  the  Uniform 
Airman  Record  (UAR).  Subpopulations  may  either  be  defined  on 
an  a-priori  basis,  such  as  by  Air  Force  Specialty  Code  (AFSC) 
designation,  or  analytically  established  by  applying  a  discrete 
dependent  variable  regression  analysis  technique  called  Logit 
Analysis.  This  technique  identifies  subpopulations  consisting  of 
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personnel  exhibiting  similar  career  transition  behavior  and  de- 
scribes them  in  terms  of  individual  attribute  data  contained  in 
the  UAR.  It  increases  career  projection  accuracy  by  reducing 
uncontrolled  variance,  and  provides  increased  specificity  in  the 
analysis  of  personnel  policy  change  impacts. 

The  PAM  methodology  includes  compvrter  programs 
which  extract  and  combine  data  elements  from  the  UAR  to  form 
an  addressable  data  bank.  Presently,  this  data  bank  contains  a 
selection  of  data  elements  from  the  1975,  1976,  and  1977  UAR 
files  for  approximately    95,000  airmen  assigned  to  thirteen 
AFSCs. 


BACKGROUND 


The  cost  and  quaUty  of  trained  system  support  pers- 
onnel have  become  extremely  important  considerations  in  weapon 
system  design  and  support  planning.  The  primary  reason  for 
attributing  such  importance  to  the  role  of  human  resources  in 
weapon  system  development  is  the  growing  concern  that  their 
cost,  which  presently  overshadows  that  of  system  acquisition, 
will  grow  to  a  size  which  will  effectively  preclude  the  afford- 
ability  of  future  systems.  The  Air  Force,  in  particular,  as  a 
branch  of  the  Armed  Services  whose  operational  effectiveness 
is  most  often  measured  in  terms  of  the  capabilities  of  its  weapon 
systems,  has  the  unfortunate  distinction  of  being  the  Service 
most  likely  to  experience  the  object  of  that  concern. 

This  situation  ha£  precipitated  considerable  research 
concerning  the  development  of  tools  and  techniques  to  implement 
the  consideration  of  human  resources  implications  of  design, 
operation  and  support  within  the  systems  acquisition  process. 
Emphasis  has  been  placed,  however,  on  human  resources  as^  re- 
quirements rather  than  as  vital  commodities  wnose  availability 
and  operational  disposition  over  time  may  be  as  crucial  to  new 
we^-pon  system  deployment  as  is  their  timely  specitication  as 
system  ownership  requirements.  (  Slide  1) 
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The  personnel  availability  methodology  which  this  paper 
describes  is  one  product  of  an  effort  which  expands  that  empha- 
sis to  address:   (1)  the  present  and  future  capability  of  a  mili- 
tary personnel  force  structure  to  respond  to  those  requirements; 
and  (2)  how  that  structure  may  be  perturbed  to  increase  its 
ability  to  do  so.  That  effort  is  the  Air  Force  Human  Resources 
Laboratory  Project  1959,  entitled  "Advanced  System  for  Human 
Resources  Support  of  Weapon  System  Development."  It  was 
undertaken  to  develop  a  coordinated  human  resources  technology 
package  which  combines  the  results  of  previous  research  in  sev- 
eral human  resources  related  technologies  to  provide  a  single 
integrated  mechanism  for  the  use  of  human  resources  consider- 
ations as  system  design  and  support  planning  guidelines.  The 
overall  objective  is  to  avoid  unnecessary  system  ownership  cost 
through  a  methodical  consideration  of  all  aspects  of  personnel, 
manpower,  and  training  within  the  design  process  itself. 

Much  of  the  integrated  technology  package  addresses  the 
early  assessment  of  system  design  and  support  alternatives  in 
terms  of  their  potential  impact  on  human  resources  requirements. 
However,  that  portion  which  constitutes  the  personnel  availability 
model  (PAM)  methodology  complements  that  activity  by  providing 
further  guidance  concerning  the  feasibility  of  meeting  those  re- 
quirements within  the  constraints  imposed  by  present  and  fore- 
seeable circumstances  of  personnel  availability.  It  allows  system 
planners  the  option  of  either  designing  a  system  in  compliance 
with  a  predicted  personnel  availability  situation  or  seeking  nieans 
to  alter  that  situation  to  provide  for  a  mission  essential  design 
capability.  In  the  former  instance,  the  PAM  methodology  can 
probabilistically  define  the  composition  of  the  personnel  force 
structure  at  the  time  of  system  deployment,  thus  identifying  a 
design/planning  requirement.  In  the  latter,  it  can  provide  a 
vehicle  for  rapid  hypothesis  testing  in  a  search  for  that  set  of 
personnel  policy  actions  most  likely  to  result  in  the  availability 
of  appropriate  personnel  to  meet  mission  essential  support 
requirement  So 

The  modeling  approach  to  personnel  availability  analysis 
embodied  in  the  PAM  methodology  is  neither  new  in  concept  nor 
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uniquely  superior  to  others  in  terms  of  its  ability  to  provide 
panacea-like  solutions  to  insurmountable  problems.  However,  as 
a  total  package  of  model,  data  base,  and  application  techniques, 
it  represents  an  important  first  step  toward  the  comprehensive 
assessment  of  weapon  system  design,  personnel  requirements, 
and  support  plans  in  terms  of  the  means  available  to  operation- 
ally accommodate  them. 


OPERATION  OF  THE  MODEL 


The  heart  of  the  PAM  methodology  is  the  personnel 
availabiUty  model  itself.  Its  objective  function  is  to  provide  esti- 
mates of  future  personnel  availability  on  the  basis  of  career 
transition  activity  projections  derived  from  historical  data  indi- 
cat.'-ig  current  and  past  force  structure  composition  and  past  car- 
eer transition  activity.  Derived  primarily  to  meet  needs  identified 
for  guidance  within  the  Air  Force  systems  acquisition  process, 
that  function  is  predicated  on  the  following  five  capability  require- 
ments:  (1)  evaluation  of  the  current  human  resources  in  the  Air 
Force;   (2)  estimation  of  that  human  resources  complement  at 
future  points  in  time;  (3)  comparison  of  estimated  human  re- 
sources availability  to  estimated  requirements  at  coincident  points 
in  time;   (4)  quantification  of  differences  between  human  resources 
requirements  and  estimated  availability;  and  (5)  identification  of 
personnel  policy  changes  necessary  to  reduce  or  eliminate  potent- 
ial disparities  between  future  personnel  requirements  and  future 
personnel  availability. 

It  was  originally  expected  that  a  model  could  be  selected 
or  adapted  from  among  the  many  manpower/personnel  models 
which  exist  today.  To  a  certain  extent  that  expectation  was  borne 
out.  However,  additional  operating  requirements  were  identified 
which  extended  the  modeling  capability  requirements  of  this  effort 
beyond  those  of  existent  candidate  models  identified  in  an  exten- 
sive literature  search.  These  requirements  called  for:  the  ident- 
ification and  tracing  of  actual  personnel  career  transition  activity; 
the  consideration  of  management  conditions  within  the  manpower 
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system,  such  as  training  and  retirement  policies;  the  calculation 
and  use  of  probabilistic  information;    minimization  of  computational 
requirements  without  substantial  loss  in  capability  to  accurately 
reflect  the  actual  functioning  of  the  Air  Force  manpower  system; 
and  operational  specificity  sufficient  to  assess  the  career  transi- 
tion activity  of  subpopulations  within  the  total  Air  Force  personnel 
population,  defined  by  personnel  attribute  designations,  while 
maintaining  enough  flexibility  to  investigate  larger  aggregate  popu- 
lations. In  addition,  the  model  to  be  selected  had  to  be  capable  of 
projecting  the  future  size  of  the  personnel  complement  to  be  found 
\'ithin  a  subpopulation  category,  itself  defined  either  by  personnel 
attribute  or  career  status  designations. 

In  order  to  meet  the  previously  defined  modeling  require- 
ments and  to  provide  a  realistic  representation  of  the  Air  Force 
manpower  system,  the  career  transition  process  of  Air  Force 
personnel  is  most  tractably  modeled  as  a  finite-state,  discrete- 
time  Markov  process*  This  conclusion  was  reached  in  considera- 
tion of  the  following  aspects  of  the  Air  Force  manpower  system 
which  are  compatible  v/ith  such  a  model:  (Slide  2) 

1)  It  is  hierarchical  when  states  are  defined  by  years 
of  service  (YOS)  and  paygrade.  Airmen  can  only  move  from  low 
paygrades  and  low  years  of  service  to  higher  paygrades  and  higher 
years  of  service,  if  they  are  to  remain  in  the  system. 

2)  It  is  approximately  Markovian.  An  airman*  s  state 
(YOS,  paygrade)  at  time  t+  1  depends  primarily  on  his  state  at 
time  t,  and  less  so  on  his  state  at  prior  times. 

3)  it  is  discrete.  An  airman  transitions  from  one  state 
to  another  at  yearly  intervals,  rather  than  at  random  time  inter- 
vals. 

A  Markov  model  is  structurally  suited  to  take  advantage 
of  the  above  three  properties  of  the  Air  Force  manpower  system, 
and  also  has  the  virtue  of  being  computationally  facile.  These  facts 
are  underscored  by  the  mechanical  simplicity  of  the  Markov  model 
chosen  to  meet  the  requirements  of  this  effort. 

A  population  possessing  user  specified  attributes  is  part- 
itioned by  the  model  into  a  state  matrix,  v;ith  states  defined  by 
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YOS  and  paygrade.  (Slide  3)    Once  the  state  matrix  for  a  given 
population  has  been  defined,  the  model  comp;ites  the  probabilities 
associated  with  the  various  allowable  types  of  state  transition. 
This  is  accomplished  on  the  basis  of  two  setf.  of  historical  data 
abstracted  from  the  Uniform  Airman  Record  (bAR).  The  data  is  , 
sampled  at  two  points  in  time,  the  interval  between  which  is  de- 
termined by  the  projection  interval  which  a  user  desires  to  be 
produced  by  the  model.  In  the  present  case,  a  one  year  model 
'projection  interval  was  desired.  Therefore,  the  UAR  was  sampled 
at  two  points  in  time  one  year  apart  (1975  and  1976).  It  should 
be  noted  here  that  the  model' s  overall  prediction  of  personnel 
availability  at  a  future  point  in  time  is  accomplished  on  the  basis 
of  an  iterative  updating  of  its  state  population  projection.  That  is, 
the  n.  iel  continually  applies  the  UAR  data-derived  transition 
probabilities  (calculated  from  the  actual  transitions  indicated  by 
the  two  point  data  sample)  to  its  most  recent  state  population 
projection  matrix,  until  the  user  specified  outyear  termination 
point  is  reached.  The  final  staie  population  projection  matrix  is 
the  airman  population  prediction  for  that  outyear  bounded,  of 
course,  by  whatever  personnel  attribute  or  career  status  designa- 
tions the  user  has  chosen  to  impose  as  output  constraints. 

Several  assumptions  were  made  concerning  both  the  flow 
of  airmen  through  the  *ir  Force  manpower  system  and  external 
po?icy  considerations  -lixch  might  conceivably  affect  the  probabil- 
ities associated  with  various  types  of  career  transition  activity. 
(Slide  4 )    In  formulating  the  model,  it  was  assumed  that  once  a 
person  enters  a  particular  state  the  probabilities  associated  with 
his  next  transition  are  Independent  of  how  or  from  where  he  may 
have  arrived  at  that  state.  That  is,  the  probability  associated  with 
his  next  transition  must  be  selected  from  those  available  to  any 
other  person  in  that  same  state.  The  second  model  formulation 
assumption  is  that  a  transition  must  occur  within  the  model  pro- 
jection interval.  The  third  is  that  transitions  must  indicate  pro- 
gress either  in  years  of  service  or  paygrade,  i.  e. ,  no  demotions 
are  allowed  and  time  in    lir^/ice  must  increase  without  regard  for 
the  actual  but  rare  incid :  ce  of  breaks  in  tenure.  The  fourth 
assumption  postulates  a  constant  recruit  renr.  rate.  This  is  to  keep 
the  transition  probabilities  "clean"  with  respect  to  variables  other 
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than  those  under  examination.  As  will  be  explained  later,  any  of 
these  assumptions  may  be  purposively  violated  by  the  user  by 
exercising  the  user-interactive  features  of  the  PAM. 

A  typical  state  within  the  model  is  illustrated  in  Slide  5. 
Such  a  state  represents  a  particular  service  tenure  and  paygrade 
within  a  single  Air  Force  Specialty  Code  (AFSC),  and  may  be 
visualized  as  a  single  cell  within  a  three  dimensional  state  popu- 
lation projection  matrix  bounded  by  those  variables.  It  is  so  rep- 
resented within  the  PAM,  with  the  additional  consideration  that 
the  population  of  each  state  is  also  bounded  by  the  set  of  pers- 
onnel attribute  constraints  imposed  by  the  model  user.  In  the 
illustration,  the  solid  arrows  indicate  the  transition  probabilities 
that  produce  a  new  state  population  (Sij' ).  Segmented  arrows  in- 
dicate the  ways  in  which  personnel  may  leave  a  typical  state.  As 
is  shown,  transition  into  a  state  can  occur  only  in  one  of  three 
ways:    (1)  a  transfer  from  some  other  AFSC,  or  a  new  accession; 
(2)  an  increase  in  paygrade  with  an  incremental  increase  in  YOS; 
or  (3)  an  incremental  increase  in  YOS  without  a  change  in  paygrade. 

Basically,  the  PAM  examines  its  historical  data  base 
and  calculates  the  exit  probabilities  associated  with  each  state  as 
determined  by  historical  precedent.  It  then  forms  a  state  and 
probability  data  base  which  is  comprised  of  transition  probability 
matrices  for  upgrade,  increment,  loss  from  service,  and  trans- 
fer. These  matrices  become  the  bases  for  determining  the  com- 
position of  future  state  populations.  The  basic  Markov  formula- 
tion for  these  future  state  probability  calculations  is  shown  in 
Slide  6.     The  following  provides  additional  detail. 

Letting  Sij  denote  a  state  at  time  t  v/ith  year  of  service 
i  and  paygrade  j,  and  S'  ij  denote  the  state  population  at  time  t+  1, 
the  following  equation  defines  the  computation  for  determining  a 
new  state  value  for  the  succeeding  time  interval: 

SMj  =  (Si-l,j)(PIi-l,j)  +  (Si-l,j-l)(PUi-l,i  I)  +  Tij 
where: 
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S'  ij  «  the  state  population  at  a  point  in  time  t+  1 ,  having  i  years 
of  service  and  j  paygrade; 

Si-l,j  »  the  state  population  at  a  point  in  time  t,  having  i-1 
years  of  service  and  j  paygrade; 

Si-l,j-l  »  the  state  population  at  a  point  in  time  t,  having  i-1 
years  of  service  and  j-l  paygrade; 

PIi-l,j  =  the  probability  that  people  in  state  Si-l,j  will  increment 
one  year  in  service  but,  will  not  leave  the  population  or 
upgrade; 

PUi-l,j-l  »  the  probability  that  people  in  state  Si-1,  j-l  .will 
upgrade  to  S'  ij  within  the  next  time  interval; 

Tij  «  the  number  of  people  from  outside  the  population  that  will 
transfer  into  state  S'  ij  within  the  next  time  interval. 

It  should  be  noted  that,  since  year  of  service  is  monoftonic,  the 
i  subscript  carries  time  in  the  equation  for  the  succeeding  state. 
New  state  population  values  are  determined  by  the  transitions 
from  the  state  preceding  it  by  one  time  interval.  Probability  of 
loss  from  a  state,  by  exit  from  the  Service  or  transfer  to  another 
population,  is  not  included  in  the  previously  described  equation 
but  is  taken  into  consideration  in  the  more  comprehensive  state 
transition  equations  within  the  PAM. 


PAM  DATA  BASE 


In  the  most  general  terms,  the  PAM  data  base  can  be 
described  as  being  comprised  of  two  sets  of  data  which  provide 
"snapshot"  .pictures  of  the  Air  Force  personnel  population  at  two 
points  in  time.  Their  comparison,  within  the  model,  reveals  the 
career  transition  activity  which  has  talcen  place  within  the  time 
interval  selected  and  provides  the  means  to  project  that  which  will 
take  place  during  successive  time  intervals  in  the  future.  The 
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equation  described  above  defines  the  process  for  that  projection. 
The  data  bases  are  abstracts  of  the  Uniform  Airman  Record 
(UAR).  The  current  PAM  data  base  was  constructed  from  the  1975 
and  1976  UAR  and  covers  approximately  95,000  airmen  in  thirteen 
technical  Air  Force  Specialty  Categories.  (Slide  7)  Data  for  indiv- 
idual personnel  are  assembled  on  PAM  records  and  address  such 
things  as  test  scores,  duty  assignments,  personnel  history,  pay 
levels,  etCo  which  the  PAM  uses  as  personnel  descriptors  or 
attributes.  The  records  cover  24  of  the  450  identifiable  personnel 
characteristics  contained  in  the  UAR  (Slide  8)  The  present  select- 
ion was  made  on  a  judgemental  basis  and  could  conceivably  be  im- 
proved for  specific  PAM  application  objectives  by  a  more  detailed 
consf leration  of  the  possible  relationships  between  individual  attri- 
butes and  those  objectives. 

The  actual  process  of  UAR  data  abstraction,  necessitated 
by  the  voluminous  nature  of  the  UAR,  is  undertaken  external  to 
the  PAMo  It  should  be  considered  a  preparation  process,  rather 
tban  a  PAM  function,  in  that  the  proper  selection  of  attributes  to 
be  abstracted  demands  considerable  user  judgement.  In  any  case, 
the  mechanization  of  the  process,  once  selection  decisions  are  made, 
is  a  straightforward  task.  In  the  present  instance,  it  was  rapidly 
accomplished  by  the  Computation  Sciences  Division  of  the  Air 
Force  Human  Resources  Laboratory.  Once  the  abstraction  process 
is  performed  and  the  two  PAM  data  record  sets  are  compiled,  the 
PAM  takes  them  as  inputs  and  combines  them  to  form  a  single 
record  set.  This  coi^ibined  record  set  is  then  used  by  the  PAM  to 
generate  transition  probability  data,  i.  e. ,  future  behavioral  probab- 
ility data  based  on  the  recorded  caree^-  transition  activity  concern- 
ing whether  individuals  in  certain  AFSCs  incremented  in  years  of 
service,  upgraded  in  pay  staius,  did  both,  transferred  to  another 
AFSC.  or  left  the  Service  within  the  time  interval  covered. 

Transition  probabilities  ait.  computed  using  the  trancsition 
data  and  the  following  algorithms: 

(1)    Computation  of  state  (S)  matrix: 

Sij  =  number  of  airmen  in  the  population  with  years  of  service  (i) 
and  paygrade  (j); 
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(2)    Computation  of  probabiUty  (P)  matrices 


PUij  = 


Plij  = 


PLij  = 


Number  of  airmen  who  left  state  (i,  j)  during  the  time 
interval  via  upgrade 

Sij  =  population  of  state  (i,  j)  at  beginning  of  time  interval 

Number  of  airmen  whose  grade  remained  the  same 
during  the  time  interval 

Sij  =  population  of  stats  (i,  j)  at  beginning  of  time  interval 

Number  of  airmen  who  left  state  (i,i)  during  the  time 
interval  by  leaving  the  Service 

Sij  =  population  of  state  (i,  j)  at  beginning  of  time  inlenral 


Probabilities  associated  with  airmen  transfer  in  and 
out  of  Air  Force  Specialty  Categories  (AFSCs)  are  included  among 
the  probability  matrices  calculations  within  the  PAM.  The  process 
by  means  of  which  they  are  generated  involves  a  comparison  of 
the  four  types  of  AFSC  designations  found  in  the  UAR   viewed  at 
the  two  points  in  time  which  bound  the  interval  of  historical  ^a 
sampling.  Discussion  of  that  process  is  beyond  the  scope  of  this 
paper. 


PAM  PROGRAMS  AND  FUNCITOTS 


The  PAM  is  comprised  of  two  computer  programs  which 
perform  the  following  four  functions:   (i)  data  base  generation; 
(2)  data  base  maintenance;   (3)  extrapolation  over  time;   and  (4) 
data  post -processing.  (Slide  9  ) 
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Each  function  is  performed  by  a  program  or  program  subroutine 
and  the  functions  are  sequential  in  the  sense  that  the  output  of 
one  function  serves  as  the  input  to  the  next  function  in  the  seq- 
uence. 

Program  1  (Data  Base  Generation  Function) 

This  program  selects  the  personnel  records  on  the  basis 
of  user-selected  criteria  for  screening  on  particular  attributes. 
The  records  selected  are  processed  and  used  to  create  matrices 
for  use  by  subroutines  two  and  three  of  program  2.  Once  the 
records  have  all  been  processed,  the  program  calculates  probab- 
ilities (P' s)  of  upgrade,  increment  in  years  of  service,  and  loss 
for  each  AFSC,  paygrade,  and  year  of  service  state  as;  was  prev- 
iously described.  The  program  also  accumulates  matrices  for  num- 
bers of  transfers  by  AFSC,  years  of  service,  and  paygrade. 

Program  2  (Model  Operation  Function) 

This  program  consists  of  three  subroutines  and  is  user- 
interactive  via  remote  terminal  facilities.  It  performs  the  following 
tasks:    (1)  data  base  maintenance  which  allows  for  user-entered 
override  modifications  to  the  state,  transfer,  and  probability  mat- 
rices;  (2)  operation  of  the  probability  extrapolator  model  for  a  user- 
specified  number  of  time  intervals;   and  (3)  printout  of  results 
and/or  current  values  contained  in  the  state,  transfer,  or  probab- 
ility matrices.  All  of  the  above  functions  are  controlled  by  the  user 
at  the  computer  terminal.  Through  r^i;ponses  to  a  series  of  quest- 
ions displayed  at  the  terminal,  program  execution  will  select  sub- 
routine 1,  2,  or  3  of  this  program,  depending  upon  the  specific  re- 
quirements of  a  task  defined  by  the  user.  (Descriptions  of  these 
subroutines  are  given  below.)  Each  time  a  user  input  is  required, 
a  statement  followed  by  a  question  ma  .;    v^ill  be  displayed  to 
prompt  the  user.  At  those  points,  pro[^:.'vm  execution  will  pause 
for  the  user  response  and  resume  onct  il  is  made.  Termination 
of  each  user  input  is  signaled  to  the  program  by  striking  the 
carriage  return  key.  The  three  subroutines  function  as  follows. 

Program  2;   Subroutine  1  (Data  Base  Maintenance  Function) 
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This  subroutine  is  used  to  modify  the  state,  probability, 
or  transfer  matrices  on  an  element-by-element  basis.  The  sub- 
routine will  perform  edit  and  reasonableness  checks,  i.  e. ,  allow 
user  override  of  the  matrix  cell  entries.  Its  use  is  optional  and 
can  be  bypassed  if  the  user  is  satisfied  with  the  matrices  com- 
puted from  the  -nput  data  supplied  by  program  1. 

Program  2;   Subroutine  2  (Extrapolation  Function) 

This  subroutine  produces  the  projections  of  state  trans- 
fer, using  the  matrices  calculated  in  Program  1.  It  steps  the 
state  matrix  forward  in  time,  interval  by  interval,  and  stores  the 
output  in  a  Result  File  for  future  output  to  the  user  in  several 
ways  and  formats  which  he  may  designate.  Examples  of  such 
outputs  are:  projections  bounded  by  specified  years  of  service, 
paygrades,  and/or  by  specific  outyear  restrictions;   and  display 
via  terminal  screen  or  printed  hard  copy. 

Program  2;   Subroutine  3  (Post  Processing  Function) 

This  subroutine  selectively  lists  any  part  of  the  Result 
File  created  by  subroutine  2.  The  portion  to  be  listed  is  a  user 
option.  Current  programming  allows  the  user  to  impose  output 
parameter  restrictions  which  yield  listings  within  the  following 

f  0I*IX13.tSI 

(1)  The  entire  state  matrix  for  paygrades  3  through  9 
and  years  of  service  1  through  21,  where  years  of  service  21 
is  an  aggregation  of  years  21  through  30; 

(2)  Any  selected  combination  of  states;  and 

(3)  Breakout  by  years  of  service  with  all  paygrades  being 
collapsed  to  yield  a  single  line  matrix  output. 

APPT.TCATTON  TECHNIQUES 


Two  very  important  features  are  built  into  the  PAM. 
The  first,  provided  by  Program  2/Subroutine  1,  allows  the  user 
to  make  changes  in  the  state  or  probabiUty  matrices;  thus  giving 
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him  the  opportunity  to  make  state  population  projections  on  the 
basis  of  postulated,  as  well  as  real,  initial  state  conditions  and 
personnel  career  flow  constraints.  The  second  feature  provides 
the  PAM  user  with  a  capability  to  investigate  the  career  transit- 
ion activity  of  subpopulations  selected  on  the  basis  of  their  mem- 
bers possessing  certain  combinations  of  personnel  attributes,  (Slide  10) 
e.g.,  age,  education,  race,  sex,  etc.  Personnel  availability  in- 
vestigation on  the  subpopulation  level  is  desirable  because  the  Air 
Force  manpower  system,  while  not  truly  a  homogeneous  system, 
has  been  modeled  within  the  PAM  as  a  Markov  process.  Specifi- 
cally, various  subpopulations  of  airmen  have  been  found  to  have 
different  career  transition  rates.  Therefore,  an  increase  in  mod- 
eling accuracy  may  be  obtained  by  dividing  subject  populations 
into  homogeneous  subpopulations  for  individual  examination. 

The  PAM  is  structured  such  that  projections  are  made 
on  the  basis  of  years  of  service  (YOS)  and  paygrade.  However, 
other  factors  may  significantly  affect  the  career  transition  pro- 
cess. Airmen  with  identical  YOS  and  paygrade  at  present  may  be 
expected  to  transition  to  different  states,,  independently  of  how 
they  arrived  at  the  current  state,  i.  there  are  sufficient  differ- 
ences among  their  other  attributes  such  as  test  scores,  marital 
status,  or  sex.  Such  differences  are  handled  in  the  PAM  method- 
ology by  two  distinct  mathematical  approaches  to  the  detailed  ev- 
aluation of  human  resources  availability  in  terms  of  their  group- 
ing on  the  basis  of  personnel  attributes.  Possible  modeling  in- 
accuracies which  might  arise  as  a  result  of  attribute  heterogen- 
eities are  accounted  for  by  the  calculation  of  separate  transition 
rates  for  each  homogeneous  subpopulation  that  can  be  identified 
by  attributes  other  than  YOS  and  paygrade.  Such  descriptors  may 
be  thought  of  as  driving  attributes. 

The  first  approacn,  which  can  be  thought  of  as  an  attri- 
bute identification  or  categorical  approach,  employs  a  qualitative 
discriminant  analysis  using  graphical  displays  of  transition  freq- 
uencies versus  attributes.  It  identifies  the  attribute  or  combinat- 
ion thereof  which  indicates  a  high  concentration  of  specific  trans- 
itions. Directed  at  determining  which  groups  of  airmen,  or  sub- 
populations  transition  alike,  it  attempts  to  flag  any  attributes  that 
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are  capable  of  being  used  to  distinguish  between  airmen  subpop- 
ulations  exhibiting  dissimilar  career  transition  rates.  This  is 
acr-  aplished  by  exposing  incidents  of  career  transitiorx  activity 
ch2  icterized  by  disproportionate  numbers  of  transitions  relative 
to**the  number  of  airmen  possessing  a  given  attribute  or  set  of 
attributes.  The  resultant  grouping  of  airmen  reflects  a  categori- 
zation on  that  basis.  The  groups  are  then  examined  as  individual 
entities  which  constitute  homogeneously  transitioning  subpopulat- 
icns. 

The  second  approach  is  directed  at  the  determination  of 
a  functional  relationship  between  specific  transitions  and  the  attri- 
butes; as  in  multiple  regression  analysis.  A  specialized  statisti- 
cal technique  called  Logit  Analysis  is  used  to  aid  in  that  determ- 
ination by  establishing  dependency  relationships  among  the  various 
attributes.  The  object  of  that  analysis,  rather  than  the  identifcat- 
ion  of  homogeneous  groups  of  airmen  in  terms  of  career  transit- 
ion rates,  is  the  determination  of  how  the  transition  rates  are  re- 
lated to  the  specific  attributes  themselves.  In  this  formulation, 
transitions  are  viewed  as  a  response  (dependent)  variable  and  the 
UAR  data  are  taken  to  constitute  a  vector  of  explanatory  (indepen- 
dent) variables.  The  variables  are  operated  upon  by  a  mathemati- 
cal model,  using  binomial  logit  analysis,  that  in  effect  regresses 
the  dependent  variables  on  the  independent  variables  to  yield  indi- 
cants of  relationship.  Results  of  this  analysis  are  particularly 
well  suited  toward  the  provision  of  aid  in  the  identification  of  air- 
men groups  that  have  a  distinct  set  of  attributes  and  a  set  of  tran- 
sition probabilities  different  from  the  average  movement  paramet- 
ers of  the  total  airmen  population,  i.  e. ,  homogeneous  subpopulat- 
ions. 

In  summation,  categorical  analysis  involves  the  identifi- 
cation of  an  attribute(s)  that  coincides  with  non- uniform  populat- 
ion transition  rates.  It  constitutes  a  search  for  driving  attributes 
to  define  similarly  transitioning  subpopulations  by  comparing  the 
frequency  distributions  of  transitions.  Logit  analysis  is  directed 
towards  the  establishment  of  a  weighting  scheme  for  the  attributes 
relative  to  transition  rates.  The  process  of  subpopulation  identifi- 
cation, state  &  probability  matrices  calculations,  and  human 
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resources  availability  projection  is  functionally  illustrated  in  Slide 
11«  Slide  12  provides  a  detailed  illustration  of  PAM  operation,  to 
include  the  PAM  methodology  for  identifying  homogeneous  subpop- 
xilations.  The  application  methodology  described  above  allows  the 
subpopulation  selection  features  of  the  basic  PAM  to  be  used  intell- 
igently on  the  basis  of  data  implications,  rather  than  that  of  trial 
and  error.  There  is,  however,  a  variable  which,  although  not  di- 
rectly addressed  by  the  PAM  methodology,  does  bear  significantly 
on  the  validity  of  the  PAM  personnel  availability  projections.  Tiiat 
variable  is  recruitment  rate.  Normally,  it  may  be  assumed  to  be 
constanto  However,  the  PAM  data  base  maintenance  Amotion  sub- 
routine may  be  used  to  input  changes  at  the  discre^.   a  of  the  user. 

Slide  13  illustrates  an  additional  use  for  the  identificat- 
ion of  homogeneous  subpopulations,  other  than  prediction  of  total 
population  availability  on  the  basis  of  aggregating  predictions  for 
the  subpopulations  which  it  subsumes.  That  additional  use  is  in 
the  analysis  of  the  potential  impacts  of  personnel  policy  changes 
on  career  transition  activity  and  future  availability.  It  should  be 
noted  that,  although  the  PAM  provides  a  basic  capability  for  this 
kind  of  analysis,  the  analysis  of  personnel  policy  impact  assumes 
a  knowledge  of  the  relationships  between  policy  and  the  personnel 
attributes  which  the  PAM  uses  to  track  personnel  career  progress- 
ion. 


SUMMARY 


A  model  has  been  developed  for  projecting  the  future 
availability  of  Air  Force  maintenance  personnel  based  on  the  1975 
and  1976  career  transition  activity  of  95,000  airmen.  The  utility 
and  specificity  of  the  model,  and  the  accuracy  of  the  projection 
results,  have  been  expanded  by  the  development  of  an  application 
methodology.  That  methodology  incorporates  certain  statistical 
techniques  to,  not  only  examine  the  Air  Force  manpower  structure 
on  a  more  individual  basis  than  was  heretofore  possible  but  also 
to,  identify  personnel  attributes  that  are  related  to  tenure  in  the 
Air  Force. 
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The  PAM  and  its  methodology  for  personnel  availability 
analysis  is  capable  of  meeting  the  needs  of  an  analyst  who,  in 
the  design  phases  of  weapon  system  acqjoisition,  desires  to  deter- 
mine whether  planned  maintenance  manpower  requirements  may 
be  expected  to  be  fulfilled  by  the  Air  Force  human  resources 
supply  at  the  time  of  weapon  system  implementation.  K  human 
resources  availability  projections  indicate  that  requirements  may 
not  be  met,  the  PAM  provides  a  capability  which  can  be  of  aid 
in  seeking  personnel  policy  changes  which  can  be  implemented 
to  effect  changes  in  the  future  availability  of  the  required  pers- 
onnel such  that  future  requirements  can  be  met. 
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Some  Implicationcs  of  Commercial  Test  Normings 
for  JSobilization  Surveys 


I  assume  that,  in  an  absrrract  way.  the  object  of  the  study  contemplated 
by  this  symposium  can  be  regarded  as  the  description  of  a  large  segment  of 
the  Ameri^n  population  in  terms  of  variables  that  are  of  military  interest, 
a  major  problem  is  that  we  don't  normally  access  that  segment  for  testing 
One  approach  is  to  try  to  estimate  the  statistics  of  interest  through  their 
relationship  with  variables  that  are  included  in  larger,  more  comprehensive 
surveys  of  populations  similar  to  the  population  of  interest.    I  will, 
however,  focus  on  a  different  alternative-that  of  operating  one  s  own 
survey-lusing  the  development  of  national  test  norms  as  an  example  (Jackson 
&  Schrader,  1976). 

This  problem  is  not  unlike  one  faced  by  the  College  Entrance  Ex^^^^J°" 
Board    for  which  we  (ETS)  are  the  technical  contractors.    The  Board  owns  the 
IZlkslTc  IpJi^ude  Ls.  (SAT),  a  test  used  for        f  -^„^:;,:,J^Xor'" 
of  American  colleges  and  universities.    It  is  useful  to  secondary  school 
ftutnts  ?o  know  where  they  might  stand  in  the  ^AT  score  distributions  To 
su^tjlv  this    the  Board  periodically  has  us  undertake  to  find  the  distribution 
o?  ?elt  sJo^es  in  American  secondary  schools.    In  this,  we  have  a  situation 
:Lilar    o  that  of  the  military.    The  SAT,  like  the  ^f^B,  in  whose  distri- 
butional statistics  we  might  be  interested,  8^^-  'ut  ?here  abo^t 
both  testing  programs  there  is  another  big  pool  of  people  out  there  about 
whom  we  would  like  to  know. 

Actually,  the  Board  doesn't  use  the  SAT  to  construct  norms;  it  ^^^s  a 
somewhat  shorter  test  called  the  Preliminary  Scholastic  AP^^^'^^^J^f  ,f '.^^^ ' 

the  same  test  development  specifications.    Also,  for  ^^^/'^^"f  ^^^^^^ 
•     ^.  ^'nA  also  r^Eularlv  administer  the  PSAT,  the  Jod  is  un±y 

xn  tne  sample  and  thau  ci.o  ^-S^;^;"-;  include  the  whole  class  (as 

one  of  getting  them  to  extend  their  testing  to  .      gince  a 

opposed  to  introducing  the  school  to  a  brand  ^'^f^fj^^^^the  PsL 

laree  portion  of  American  secondary  schools  regularly  administer  the  PSAT, 

considered  for  studying  variables  of  military  interest. 

Next    we  choose  schools  as  the  sampling  unit.    It's  the  ^°st  feasible 
Next,  we  cnoo  American  secondary  schools  for  the  College 


for  studying  the  population  is  to  develop  a  frame,  but  I  can  hardly  avoid 
saying  it.  The  problem  that  I  s»e  for  mobilization  studies  is  that  the 
population,  as  defined,  is  one  flm  which  it  is  very  difficult  to  conceive 
a  frame.  But  frame  construction,  at  any  rate,  is  one  of  the  first  things 
that  I  would  undertake.  Possibly,  in  sheer  desperation,  I  would  consider 
modifying  the  problem  to  be  one  where  a  frame  can  be  constructed. 

Next,  the  sample  is  to  be  drawn,  and  we  must  decide  how  many  cases 
are  necessary.     Those  of  us  who  have  been  involved  in  quantitative  research 
over  the  years  know  that  much  soul-searching  goes  with  setting  sample  sizes. 
Usually,  the  survey  sponsors  want  the  samples  to  be  small  because  that  keeps 
the  dollar  cost  down;  the  statisticians  want  the  sample  sizes  up  to  get 
narrow  confidence  bands.     For  myself,  I  have  arrived  at  the  following 
attitude,  one  I  can  reach  because  of  some  statistical  results  about 
simultaneous  confidence  intervals.     People  don't  ordinarily  identify 
simultaneous  confidence  intervals  with  surveys  of  this  kind,  but  I  believe 
an  important  implication  for  us  is  there.    We  usually  have  several  parameters 
that  we  want  to  estimate,  and  we  would  like  to  draw  some  conclusion  about 
the  set.    Test  norms  comprise  a  whole  series  of  statistics,  and  we  don  t  just 
concern  ourselves  with  the  precision  of  each  one  at  a  time.    Now,  the  find- 
ing that  motivates  my  attitude  about  sample  sizes  is  that  if  one  sets  many 
confidence  intervals  narrow  enough  to  be  useful,  more  cases  are  required  than 
there  is  population!     That's  impossible,  of  course;  it  happens  because  the 
intervals  are  derived  using  infinite  theory  and  the  populations  are  finite. 
But  the  thrust  of  it  is  that  whatever  you  do,  it  isn't  going  to  be  enough. 
Therefore,  my  attitude  is  that  you  should  take  what  the  traffic  will  bear; 
get  all  you  can  get  if  you're  interested  in  multiple  pieces  of  information. 
It's  very  seldom,  in  a  social  science  inquiry,  that  I've  encountered  a 
situation  in  which  any  really  credible  confidence  interval  construction  was 
involved;  in  fact,  I  can't  remember  an^.     Rather,  there  will  be  a  certain 
amount  of  money  available  to  do  the  study.     What  you  do  is  to  balance  your 
resources  so  that  you  can  do  the  best  job  for  your  sponsor  and  his  purposes. 
Within  that  context,  you  get  as  large  a  sample  as  you  can  get  and  still  get 
the  rest  of  the  job  done-that's  the  state  of  the  art.    Of  course,  you  should 
calculate  confidence  intervals  for  single  pieces  of  information,  and  if  those 
are  too  broad  to  be  meaningful  for  the  sample  size  you  have,  you  can  reason- 
ably doubt  whether  the  study  should  be  performed  without  modification. 

Therefore,  by  some  procedure  not  entirely  statistical,  we  arrived  at 
the  conclusion  that  there  should  be  200  schools!    The  next  problem  is  to 
get  their  cooperation  and  that  of  their  students.     How  (and  whether)  you 
secure  that  cooperation  are  crucial.    Probably  this  seminar  is  supposed  to 
concern  technical  things,  but  I  think  that  careful  attention  to  securing 
the  cooperation  of  the  ^participants  is  every  bit  as  important,  if  not  more 
so.     In  securing  this  cooperation,  the  College  Board  has  a  lot  going  for  it. 
First  of  all,  SAT  scores  have  value,  and  as  a  consequence  of  participation, 
they  can  be  supplied.     Thus,  the  student  has  access  to  information  that  can 
help  him  forecast  his  position  when  the  time  comes  for  the  grand  sorting  ot 
high  school  students  into  the  slots  of  the  world,  and  he  can  get  it  for 
nothing.     He  gets  some  practice  for  the  real  thing,  and  the  result  never 
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hurts  his  record.    He  gets  it  a  little  bit  earlier  in  his  career  than  he 
normally  would,  and  that  helps  him  plan  his  postsecondary  school  "reer 
strategies.    The  student  that  really  doesn't  have  a  college  career  in  mind 
and  had  not  planned  to  take  the  PSAT  may,  in  fact,  learn  Jhat  there  is 
some  desirable  higher  education  alternative  open  to  him  that  he  would  not 
Ze  toown  about  otherwise.     (Parenthetically    since  we  are  discussing  what 
the  examinee  has  to  gain  by  taking  the  test,  let  ^^"'^f  J^f^^^^that 
very  little  luck  with  money  as  an  incentive  for  students     My  belief  is  that 
the  amount  of  money  needed  to  buy  enough  student  cooperation  to  P'^ojuce  a 
good  study  is  more  than  one  can  afford.    But  I  do  think  that  there  has  to 
be  an  incentive  for  examinees.) 

Schools  can  have  advantages,  too.    Those  not  usually  Pf ^^^^P^^^^^^^^. 
the  PSAT  programs  will  gain  experience  with  a  national  testing  program^^^^ 
which  they  haven't  had  previous  experience,  ana  w^-xx         6^-^-'^*^;  olTiHonhB 
tion  that  they  didn't  have.    The  schools  that  normally  have  their  students 
take  the  SAT  or  PSAT  will  have  the  national  norms  updated,  they  can  use  them 
in  Accustomed  ways  and  will  have  more  complete  guidance  informa  ion  about 
their  student  bodies.    Knowing  that  these  things  are  going  for  them,  it  is 
perfectly  reasonable  for  the  College  Board  to  contact  American  secondary 
spools,  ask  them  if  they  would  be  interested  and  willing  to  P-ticipate  In 
national  norms  development  for  the  SAT,  and  expect  to  get  takers,  i 
^Sasize  all  this  strongly  because  I  think  that  to 

you  must  approach  the  institutions  and  the  examinees  in  such  a  way  that  they 

Lve  a  reason  to  cooperate  with  you      In  -^1^^^^^  --^te'SfeSlve  i^sore' 
appeal  to  abstractions  such  as  patriotism,  and  that  will  be  ^"^"^^^^^ 
cases.    But  when  you're  pursuing  cooperation  of  ^-^P^^' "^f^^^^^s 
acceptance.    You  must  have  appeals  with  nearly  '^^^^"^^i^f  f ^^^^^^^^^^  ^^^^ 
Therefore  I  think  you  must  somehow  create  an  approach  that  establishes  some 
gain  for  the  participants. 

In  any  case,  a  letter  is  written  to  the  schools  and  signed  by  the 

PresidLHf  the'college  Board.     It  ^-ll%f-^,^^\^!^^^i^fse^ice  we're 
motivate  the  schools  to  participate^    At  Educational  testing  Service,  we  re 
very  careful  about  who  signs  such  letters  and  how  they  are  y^^"^^;  ^ven 
so    we  are  perhaps  sometimes  not  as  careful  as  we  ought  to  he.  Generally 
we'try  ?o  f ind  signers  who  are  of  significance  to  the  recipients,  as  do  you 
In  Sitary  testing  research.     (I  date  myself  by  admitting  it    but  I  have 
o^roniartlcipated  in,  n  observed,  the  drafting  of  a  letter  for  the  Army 
f,]lt.TGelT.Vs  signature.)    I  think  the  source  of  the  letter  °ugh  to 

^-dp-^^ 

^Vlo  flprvice  is  a  good  purpose.    But  tney  re  miJ-xLeu-y        f  > 

be  as  directly  connected  with  the  goals  of  a  school  P^^i-^^Pf,^^,"^  "'^^^  ^ 

educational  associations.    I  think  this  very  important  part  of  it  needs  a 
lot  of  thought. 


Well,  we  did  our  best,  solicited  the  schools,  and  collected  the  data. 
How  well  did  we  do?    In  a  1974  study,  the  partipipation  rate  was  58.4  of  the 
schools.    We  don't  know  why  the  rate  was  that  low.    With  all  those  things 
going  for  the  study,  the  sample  obtained  still  wasn't  that  large,  so  we 
now  had  the  problem  of  deciding  how  much  of  an  effect  the  loss  of  schools 
had  on  the  result.     One  way  to  approach  this  problem  is  to  take  spot  surveys 
of  the  non-respondents.     If  you  do  this,  it  is  very  possible  that  the  results 
will  get  you  to  your  answer.    Maybe  that's  obvious,  but  let  me  give  you  an 
example  of  where  it  worked  out  very  neatly.     In  a  survey  of  elementary 
schools  for  the  National  Commission  for  Marihuana  and  Drug  Abuse  (Boldt, 
Reilly,  &  Haberman,  1976),  we  were  studying  the  types  of  drug  education 
programs  that  were  available  in  elementary  schools.     The  response  rate  was 
terrible,  less  than  10%.    Who'd  believe  a  4%  sample?    One  option  was  to 
find  characteristics  of  that  4%  sample  and  compare  them  to  national  character- 
istics and  find  they're  the  same.     But  that  isn't  very  convincing  as  a 
procedure,  because  it  doesn't  explain  why  you  got  the  people  that  you  got; 
it  merely  tells  you  some  ways  that  they're  not  different.     But  that  doesn  t 
establish  that  they're  the  same;  it  just  establishes  that  you  didn  t  find 
the  ways  they  differ.     In  our  case,  we  went  back  to  a  sample  of  schools  who 
didn't  respond  to  us,  and  repeatedly  called  them  until  they  talked  to 
Their  reactions  were  fairly  monolithic.    The  schools  thought  we  had  o^fgi^^l^y 
contacted  them  by  mistake:    What,  after  all,  did  we  mean  asking  them  about  a 
drug  abuse  program  in  elementary  school?    They  didn't  believe  there  was  a 
drug  problem  in  elementary  school  and  felt  that  the  survey  simply  didn  t 
apply  to  them.     In  no  case  was  there  a  consciously  formed  drug  education 
program.    The  obvious  point  is  that  anything  we  came  out  with  in  our 
sample  is  an  overestimate  of  the  magnitude  of  such  education.    We  f°"nd  in 
our  data  that  such  education  occurred  in  very  few  places,  that  when  it  did 
occur  it  occurred  only  because  officials  at  the  next  higher  levels  of 
organization  wanted  it,  not  because  of  pressure  from  the  community.  The 
survey  did  establish  that  there  wasn't  local,  immediate  pressure  ^^^""^^ 
a  perceived  drug  problem;  and  there  wasn't  much  education  going  on.  That 
was  part  of  our  answer,  but  we  got  it  by  asking  the  non-respondents,  not 
by  comparing  information  from  respondents  with  existing  statistics. 
Unfortunately,  many  times  researchers  do  not  ask  the  """-P^^^f "  3^^^, 
they  didn't  cooperate.    With  schools,  sometimes,  it's  just  ^hat  the  testing 
area  isn't  big  enough.     If  you're  losing  schools  for  a  reason  like  that  you 
can  sample  or  work  out  some  other  compromise. 

Another  technique  for  the  non-response  problem  is  to  send  a  ^mall 
simple  questionnaire  with  the  original  letter  of  request  and  try  very  hard 
to  get  it  completed  and  returned  by  all  persons  contacted.    You  ^hen 
perhaps  get  a  hypothesis  about  why  they  don't  participate  if,  fact 
they  don't.     Easily  supplied  descriptive  information  would  be  8°°^.  and  I 
suggest  that  you  don't  leave  a  place  to  say  "we  are  not  participating 
because,"  because  frankly  that  makes  it  too  easy  for  them  not  to.  Mafce 
the  special  approach  to  non-participants  separately. 

In  summary,  it's  best  if  you  can  capitalize  on  some  ^^^^^ing  program 
The  fact  that  the  program  exists  indicates  that  people  have  an  interest  in 
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it,  and  2^  you  can  relate  what  you're  doing  to  that  interest  you  have  a 
better  dmace  of  getting  cooperation.    Next,  you  need  to  establish  in  the 
minds  of  -die  possible  examinees  and  the  possible  participating  schools  ox 
sampling  :^ts,  whatever  they  are,  that  they  have  a  stake  in  thi'  enter- 
prise—nmc  because  they  have  a  stake  in  what  you  do  but  because 
a  stake  ±a  it  because  of  what  they  do.    Third,  I  suggest  that  yc 
the  sample  that  you  can  get,  consistent  with  the  cost  and  requi- 
the  rest  of  the  job.    Finally,  make  provision  for  collecting  ea 
supplied  data  from  everybody  you  write  to  in  the  first  place,  fc 
those  who  don't  cooperate,  find  out  why,  and  modify  your  procedt 
you  can,  to  get  them  in.    Those  aren't  technical  procedures  (I  ix. 
remarkably  unstatistical,  I  guess,  when  I  think  back  on  them),  but  to  the 
extent  that  you  need  to  go  out  and  estimate  national  statistics,  they  have 
been  crucial  in  our  educational  research. 


have 

all 
3  of 
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if 
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Abstract 

Measurement  of  the  military  base  population  is  needed  to  serve  three 
general  purposes.    First,  knowledge  of  the  population  distribution  of  general 
and  specific  abilities,  vocational  interests,  and  some  skills,  can  facilitate 
high  quality  test  development  research.    The  World  War  II  general  ability 
measure  on  12-million  men  has  been  exceedingly  helpful,  and  it  is  time  to 
update  and  expand  it.    The  second  purpose  will  be  facilitation  of  manpower 
research,  through  obtaining  population  demographic,  biographic,  socio- 
economic data.     In  order  to  understand  and  manage  the  force,  understanding 
of  what  is  in  the  well  seems  critical.    The  third  purpose  will  be  facilitation 
of  recruiting  research  through  providing  parameters  of  popularity  and  avoidance 
in  the  relevant  age-group  population.     The  paper  identifies  some  sources  of 
the  needed  data  for  the  population  measures • 
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Measuring  the  Military  Base  Population  of  the  1980 's 

M.  A.  Fischl 

U.  S.  Army  Research  Institute  for  the  Behavioral  and  Social  Sciences 

Alexandria,  Virginia  22333 

There  is  a  clear  need  to  learn  what  the  population  distribution  of 
military  age  young  people  will  look  like  on  relevant  dimensions. 

Samples  from  the  population,  even  very  large  samples  which  join  the 
military  service  over  an  extended  period,  differ  very  drastically  on 
basic  attributes  depending  on  the  political,  economic,  and  defense  state- 
of-af fairs  in  the  Unites  States  and  the  world  at  the  time.    Consider  the 
figure  below,  which  is  the  distribution  of  AFQT  scores  of  all  men  entering 
the  Army  for  the  first  time  in  two  recent  12-month  periods,  fiscal  years 
1969  and  1977.    In  1969  the  country  was  at  war,  1977  was  a  very  recently 
completed  period  under  all-volunteer  operations.    Although  essentially 
the  same  in  means,  the  dispersions  are  about  as  disparate  as  two  distri- 
butions can  get.    WHAT  DOES  THE  POPULATION  FROM  WHICH  THESE  SAMPLES 
WERE  DRAWN  REALLY  LOOK  LIKE?,  which  is  the  point  of  today's  symposium. 

Measurement  of  the  military  base  population  is  needed  to  serve  three 
general  purposes.    First,  precise  knowledge  of  the  population  distri- 
bution on  general  and  specific  abilities,  vocational  interests  and  per- 
haps some  skills,  can  be  very  facilitating  of  high  quality  test  develop- 
ment research.    Knowledge  of  the  population  distribution  permits  re- 
search to  utilize  smaller  samples,  stratified  to  conform  to  the  population 
distribution,  than  would  otherwise  be  needed — this  translates  to  lower  costs 
and  less  disruption  of  operations.    Knowledge  of  the  population  distri- 
bution allows  for  more  precise  estimation  of  psychometric  relationships 
through  enabling  use  of  such  statistics  as  range  restriction  corrections, 
which  are  dependent  on  such  information.    This  is  doubtless  one  reason 
why  military  employment  tcot  vaxxuxujr  uwcj.j.ai-aciiuo  oj.^  ..^o*. — 

than  those  in  private  industry.    The  World  War  II  general  ability^ 
measure  on  12-million  men  has  been  exceedingly  helpful  for  these  30-odd 
years,  and  it  is  time  to  update  and  expand  it  to  other  psychological 
domains . 

The  second  general  purpose  served  by  knowledge  of  the  military  base 
population  will  be  facilitation  of  manpower  research.    A  separate  pool 
of  information  from  that  of  the  prior  paragraph,  but  obtainable  by  similar 
methodology,  consists  of  demographic,  biographic,  socio-economic  variables 
descriptive  of  the  population  of  young  adults.    A  few  years  ago  our  office 
did  a  small  feasibility  examination  of  some  sources  of  these  data,  to 
answer  the  very  relevant  question:     "How  representative  is  the  Army?". 
A  particular  source  looked  at  was  the  U.S.  Office  of  Education  (Depart- 
ment of  HEW)  National  Longitudinal  Study  of  the  High  School  Graduating 
Class  of  1972  (NLS) .    We  analyzed  the  original  d«ta  set  (Spring  1972) 
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and  the  first  follow-up  (October  1973).     Some  outcomes  of  that  analysis 
were: 

a.  Longitudinal  capture  rate  from  Spring  72  to  Fall  73  was  86%. 

b.  Of  an  N  of  approximately  20-thousand,  321—1^% — had  joined  the  Amy. 


c.  These  321  cases  divided  as  follow  on  some  dimensions  of  interest: 

Race:     75%  White,  17%  Black,  8%  Other 

Socio-economic  Status:     42%  Low,  42%  Middle,  14%  High 

Region:    Northeast  18%,  North  Central  27%,  South  35%, 
West  20%. 

High  School  Activities:  Athletics  56%,  Vocational 
Educational  21%,  Hobby  Clubs  22%,  Drama,  Debating, 
Music  25%. 

d.  We  did  not  review  all  dimensions. 

The  point  here  is  not  to  report  on  the  NLS  but  to  indicate  that  population 
data  on  these  types  of  variables  can  be  helpful  to  manpower  research.  To 
understand  and  manage  the  force  composition,  understanding  of  what  is  in  the 
well  seems  critical. 

The  third  purpose  we  see  facilitated  by  the  mobilization  population 
data  set  will  be  recruiting  research,  providing  precise  quantification  of 
domains  of  military  service  perceived  to  have  positive  and  negative  valences, 
and  providing  stable  benchmarks  for  evaluation  of  changes  in  Service  recruit- 
ing policies. 

How  may  these  ends  be  served,  yielding  a  census  of  young  people  in 
terms  of  the  types  of  variables  I  described?    Clearly  we  will  need  to  splice 
together  data  from  several  sources,  coupled  with  some  special-purpose 
empirical  data  collection.     It  seems  apparent  that  the  High  School  ASVAB 
Testing  Program  can  be  of  great  value,  and  splicing  it  to  the  production 
AFEES  Testing  Program  seems  a  natural;  reliance  on  and  tying  into  subsequent 
NLS  follow-ups  would  seem  profitable;  lessons  to  be  learned  from  Project 
TALENT  should  be  sought;  there  are  numerous  ad  hoc  and  continuing  Department 
of  Labor,  Department  of  Defense  (e.g.,  Gilbert  Youth  Survey),  Bureau^of 
Labor  Statistics  and  Bureau  of  the  Census  surveys  to  be  examined.  I've 
just  skimmed  the  surface  and  the  obvious.    The  other  participants  today 
will  tell  us  of  prior  experiences  in  this  type  of  endeavor  inside  and 
outside  the  Department  of  Defense  and  I  hope  illuminate  a  way  that  we  can 
cooperatively  inventory  the  military  base  population  of  the  1980 's. 
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DEVELOPMENT  OF  A  MOBILIZATION  POPULATION  INVENTORY  USING  EXISTING  ASVAB 
DATA  BANKS 


The  Military  Enlistment  Processing  Command  (MEPCOM)  annually  tests 
approximately  1.9  million  individuals  with  the  Armed  Service  Vocational 
Aptitude  Battery  (ASVAB).     Approximately  800,000  are  applicants  for 
enlistment  in  the  Armed  Services  and  are  tested  with  the  ASVAB  Form  6  or 
7.    The  applicants  are  tested  at  one  of  the  66  Armed  Forces  Examining 
and  Entrance  Stations  (AFEES)  or  at  one  of  over  1.000  testing  locations 
run  by  the  AFEES.    Additionally,  there  are  approximately  1.1  million 
high  school  students  who  are  annually  administered  the  ASVAB  Form  !  in 
their  own  high  schools.     Results  of  this  high  school  version  are  then 
used  to  provide  a  prescreened  list  of  mentally  qualified  prospects  for 
enlistment. 

The  mobilization  population  can  be  defined  as  the  set  of  lB-24  year 
old.  American  citizens.     Presumably  this  subset  of  the  American  populatic 
would  be  subject  to  the  draft  during  a  national  emergency.  Naturally, 
it  would  be  desirable  to  test  a  large  unbiased  sample  of  the  mobiliza- 
tion population  to  develop  a  new  mohillzation  hase,  hut  the  cost  would 
probably  be  prohibitive.     Accordingly,  we  may  have  to  sacrifice  some 
theoretical  purity  in  the  face  of  economic  constraints.  Nevertheless, 
the  possibilities  appear  bright  for  tailoring  existing  data  to  develop  a 
new  mobilization  base  which  can  be  more  accurate  and  of  greater  utility 
than  the  one  currently  in  use.     The  data  base  that  appears  most  feasible 
for  use  in  modeling  the  mobilization  population  Is  the  high  school 
sample.     There  are  two  reasons  for  this  selection:     low  cost  and  low 
pre-test  bias.  „^ 
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First,  the  sheer  magnitude  of  the  number  of  students  tested,  combined 
with  the  demographic  data  that  is  coded  by  the  students  on  their  answer 
cards,  allows  for  the  instant  computerized  analysis  of  extremely  large 
samples.     Additionally,  the  results  obtained  on  the  ASVAB  3  are  relatively 
unbiased  by  illegal  pretest  assistance.     Unfortunately,  applicants  for 
enlistment  are  frequently  provided  pretest  information  regarding  items 
on  the  ASVAB  6  or  7,    This  behavior  (commonly  called  compromise)  is  a 
major  factor  inflating  the  scores  of  applicants  on  the  Armed  Forces 
Qualification  Test  (AFOT) ,  which  is  the  qualification  portion  of  the 
ASVAB.     In  contrast  to  the  ASVAB  6/7  production  testing,  the  effect  of 
test  compromise  in  the  high  school  testing?  proaram  appears  to  be  neglig- 
able.    Only  8%  of  those  taking  the  test  initially  indicate  a  desire  to 
enter  the  military.     Additionally,  those  who  intend  to  obtain  illegal 
assistance  probably  ont  for  immediate  testing  on  tbe  ASVAB  6/7  rather 
than  waiting  for  their  high  school  to  schedule  the  ASVAB  5.     Figures  1 
and  2  are  examples  of  ASVAB  5  vs  ASVAB  6/7  percentile  plots  for  male  and 
female  applicants,  respectively,  who  were  administered  both  tests.  The 
ASVAB  5  sample  appears  to  be  closer  to  a  "normal  curve"  than  the  curve 
depicting  the  same  sample  of  individuals  who  took  the  ASVAB  6/7  version 
of  ASVAB. 

There  are  two  basic  problems  with  estimating  the  performance  character- 
istics of  our  referenced  population,  no  matter  what  sample  we  take: 

(1)  The  motivation  of  the  individuals  within  the  sample. 

(2)  the  appropriateness  of  the  sample. 


The  motivation  factor ^unfortunately ,  has  pervasive  effects  with  our 
current  method  of  normin^.    During  the  draft  era,  a  signf leant  portion 
of  selective  service  registrants  would  intentionally  fail  the  AFQT  In 
the  hope  they  would  not  be  inducted.     Now  the  problem  is  reversed.  A 
significant  portion  of  applicants  has  received  some  form  of  unauthorized 
testing  assistance  so  that  they  will  be  found  mentally  qualified  for  the 
service  and  job  of  their  choice  when,  in  fact,  they  are  unqualified. 
The  current  method  of  norming  new  tests  requires  stratification  of  a 
sample  of  applicants  in  the  AFEES  by  AFQT.    Unfortunately,  since  the 
existing  AFQT  is  compromised,  the  new  items  appear  harder  because  the 
stratified  sample  makes  applicants  appear  more  capable  then  they  really 
are.    Our  norms  are  continually  degenerated  each  time  this  stratifica- 
tion process  takes  place  because  the  effects  are  cumulative.     In  essence, 
we  are  stratifying  the  population  to  insure  the  sample  is  unbiased  but 
we  are  not  compensating  for  the  overriding  effect  of  test  compromise. 

While  it  is  difficult  to  quantify  the  extent  of  compromise,  its  effect 
can  be  demonstrated  using  a  verification  composite  first  proposed  by 
Sims  (reference  1).    Figures  3  through  6  are  percentile  plots  based  on 
all  applicants  tested  in  all  AFEES  from  Apr  -  Jun  7fi.    The  qualification 
composite  (AFQT)  is  compared  against  a  composite  based  on  other  non-AFQT 
tests  within  the  ASVAB.    For  ease  of  discussing,  this  verification 
composite  is  called  the  "pseudo"  AFOT.    The  difference  in  these  curves 
reflects  different  rates  of  compromise  by  service.     Unfortunately  there 
is  no  readily  available  ''clean'*  sample  of  applicants  upon  which  to 
measure  the  true  extent  of  compromise  in  the  current  AFQT  versions. 
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There  are  additional  indications  of  test  compromise.    MEPCOM  recently 
instituted  a  statistical  procedure  to  identify  those  applicants  who  had 
inconsistently  hi^h  qualification  scores,  so  that  they  mip.ht  he  retested. 
A  "5%  screening  tahle"  was  developed  usinp:  the  cross  tabulated  test 
scores  of  a  sample  of  over  AO, 000  applicants  for  enlistment.  The 
screening  tahle  is  an  internal  consistency  check:     the  performance  on 
individual  tests  within  the  AFQT  is  compared  to  the  performance  on 
highly  correlated  tests  elsewhere  in  the  battery.    The  tables  are 
statistically  designed  to  screen  5%  of  all  those  annlicants  whose  test 
score  comparisons  appear  most  aberrant. 

Using  this  screening  table  on  a  sample  from  the  AFEES,  it  is  readily 
apparent  the  AFQT  is  compromised,  especially  the  Word  Knowledge  test. 
Table  1  shows  a  comparison  of  this  nature  where  a  "clean"  sample  of 
313A  by  USMC  recruits  were  tested  in  early  1976  when  the  ASVAB  6/7  was 
initially  introduced.     Ideal Iv,  what  is  needed  is  some  form  of  internal 
consitency  check  of  item  distractors  to  eliminate  applicants  who  were 
"coached"  from  the  sample  while  still  retaining  an  unbiased  sample  of 
the  mobilization  population.     In  this  fashion,  those  Individuals  who 
purposely  failed  most  of  the  easier  distractors  could  be  detected  as 
inconsistently  low.     By  the  same  method,  those  who  consistently  failed 
the  difficult  distractors  vet  who  scored  high  enough  to  qualify  could  be 
identifed  as  inconsis tentlv  high.     Until  internal  consistency  checks  are 
instituted,  the  current  practice  of  standardizing  tests  In  the  AFEES 
using  the  existing  qualification  test  should  be  used  onl'/  as  an  interim 
solution. 
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A  recent  preliminary  report  of  the  descriptive  statistics  for  the  ASVAB 
5  (reference  2)  indicates  that  motivation  is  not  a  problem.  Few 
students  intentionally  do  poorly  on  the  ASVAB  5.     The  students  takinfi 
the  ASVAB  5  are  not,  however,  a  random  selection  of  all  high  school 
students  within  ::he  nation.     For  most  students  the  decision  to  take  the 
test  is  voluntary  since  the  DOD  does  not  require  the  test.     In  our 
recent  analysis,  30%  of  the  students  tested  indicated  a  desire  to  attend 
a  four  year  college  program.     In  addition,  a  greater  percentage  of 
students  from  the  south  elect  to  take  the  ASVAB  than  would  be  expected 
by  examining  population  density  statistics.     Nevertheless,  sample  bias 
can  be  overcome.    This  school  year  (77-78),  for  the  first  time,  data  is 
being  differentiated  on  the  basis  of  which  testing  sessions  are  manda- 
tory and  which  are  optional.     (Mandatory  sessions  are  those  for  which 
the  high  school  counselors  have  elected  to  test  all  students  within  a 
given  grade).     In  essense,  it  is  now  possible  to  obtain  representative 
statistics  on  those  students  who  previously  chose  not  to  take  the  ASVAB 
because  of  a  predominate  interest  in  attending  college.    We  can  now 
compensate  for  sample  bias  by  statistically  selecting  test  data  from 
mandatory  testing  sessions  whose  aggregate  population  reflects  the 
demographic  characteristics  of  the  nation  in  terms  of  the  following 
characteristics: 

(1)  Population  density  (by  zip  code  region) 

(2)  Race 

(3)  Sex 

(4)  Plans  after  graduation.  • 
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Initial  norms  based  on  hi^h  school  testing  results  appear  promising*. 
Referring  to  the  information  in  Figure  7,  we  have  a  comparison  of  a 
norm  based  on  a  random  sample  of  ASVAB  5  results  against  the  norms 
actually  used  for  enlistment  qualification.     By  demograpbicallv 
stratifying  the  sample,   the  curve  may  be  shifted  to  the  right  somewhat 
but  it  is  apparent  that  this  student  sample  will  better  describe  the 
mobilization  population. 

It  should  be  clear  from  the  forms  of  the  two  curves,  a  sample  of  over 
200,000  students  will  describe  a  smoother  and  more  representative  curve 
then  a  sample  of  preselected  recruits.     At  the  present  time  MEPCOM  is 
testing  one  out  of  every  six  seniors  in  the  nation.     Ry  the  use  of 
prudent  statistical  sampling  one  can  have,  at  reasonable  cost,  a  large 
data  base  that  is  demographically  stratified  to  reflect  an  unbiased 
sample  of  the  nation's  mobilisation  population. 
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TABLE  1 
SCREENING  EFFECT  ON 
CLEAN  AND  OPERATIONAL  SAMPLES  OF  USMC  RECRUITS 


(313r 

ERUITS) 

OPERATIONAL  , 
(1896  RECROITS)A 

ASVAB  TEST 

(1)  (2) 

^FAILED  !EfAILED 

(3)  iH) 

^FAILED  ^FAILED 

30 

0.962 

322  6.582 

AR 

16 

0.51Z 

28  0.572 

SP 

11 

0.352 

19  0.392 

*April  1978  Marine  Corps  Applicants, 


Air  Force  Experience  with  PROJECT  TALENT 


Lonnie  D.  Valentine,  Jr. 
Air  Force  Human  Resources  Laboratory 
Brooks  Air  Force  Base,  Texas 


Over  the  years,  one  of  the  chief  means  used  by  test  constructors 
for  maintaining  stable  normative  standards  (or  score  meaning)  from  one 
form  of  a  test  to  the  next  has  been  through  equipercentile  conversion 
procedures  in  which  each  form  of  the  test  is  calibrated  such  that  its 
distribution  matches  that  of  a  "reference"  test  which  has  been 
calibrated  for  the  target  population. 

In  the  case  of  the  Armed  Services,  the  most  frequently  used 
enlisted  test  standarization  reference  measure  has  been  the  Armed 
Forces  Qualification  Test  (AFQT) .    The  predecessor  test  to  AFQT  was 
standardized  during  World  War  II  on  a  very  large  sample,  representative 
of  service-age  young  men. 

Since  that  time,  virtually  all  service  selection  and  classifica- 
tion test  batteries  for  enlisted  personnel  have  been  calibrated  against 
those  World  War  II  standards  through  AFQT.    This  has  generally  been 
true  regardless  of  whether  the  score  being  standardized  was  intended  as 
an  alternate  form  of  AFQT. 

For  its  officer  test  programs,  the  Air  Force,  in  1954-55,  adapted 
as  its  standards  reference,  distribution  of  ability  among  Air  Force 
Academy  (AFA)  applicants.     The  assumption  was  that  academy  applicants 
were  a  select  group  of  young  men  who  could  well  define  the  standard 
against  which  officer  applicants  were  to  be  compared.  Moreover, 
because  AFA  is  a  prestige  program,  ability  levels  of  AFA  applicants 
were  assumed  to  be  fairly  constant  from  one  year's  applicants  to  the 
next.    Prior  to  that  time,  officer  standards  were  calibrated  in  terms 
of  performance  of  World  War  II  aircrew  program  applicants. 

Academy  applicants  did,  in  fact,  prove  to  be  a  select  group  of  young 
men,  but  over  time  the  nature  of  their  selectivity  was  such  that  they 
became  inadequately  representative  of  the  "target"  pool  for  other 
officer  programs.    Their  performance  on  both  verbal  and  quantitative 
ability  measures  was  initially  equivalent/above  average  for  18-year- 
olds,  but,  over  the  first  few  academy  classes,  the  applicants  became 
increasingly  self-selected  on  quantitative  abilities  while  their  levels 
of  verbal  ability  held  constant.     Thus,  if  one  established  norms  for 
successive  AFOQT's  directly  on  raw  score  distributions  among  academy 
applicants  to  successive  classes,  the  verbal  norms  would  have  held 
fairly  constant  within  the  broader  officer  target  population,  but 
quantitative  norms  would  be  badly  biased  (i.e.,  relatively  higher 
raw  scores  would  be  associated  with  moderate  or  low  converted  scores). 
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If  one  considered  the  sum  of  College  Entrance  Exam  Board/Verbal  and 
Quantitative,  indicators  of  general  ability,  as  the  reference  measures, 
one  vould  have  a  circumstance  in  which  the  use  of  these  reference 
measures  would  result  in  "easy"  verbal  standards  and  "difficult" 
quantitative  standards. 

This  brings  us  to  an  important  principle  with  ^respect  to  equi- 
percentile  norming  procedures,  specifically:    the  stronger  the 
relationship  between  the  normative  reference  measure  and  the  measure  to 
be  standardized,  the  higher  the  probability  that  the  new  norms  will  be 
unaffected  by  atypical  sampling  and  uncontrolled  variables. 

Air  Force  began  reviewing  its  test  standardization  procedures  in 
light  of  this  principle  and  concluded  that  it  would  be  highly  desirable 
to  use  a  different  reference  measure  for  each  test  or  composite  to  be 
normed,  with  the  reference  measures  selected  to  correlate  as  highly  as 
possible  with  the  score  being  normed;  if  an  Airman  Qualifying 
Examination  were  being  normed,  it  would  be  desirable  to  use  separate 

H  Elfirtronics  AI  reference 
measures.     Ideally,  these  reference  measures  would  be  parallel  forms  o£ 
their  counterpart  In  the  new  battery  and  would  correlate  with  it  or 
about  the  order  of  reliability. 

What  was  needed  was  a  large  data  base— lots  of  subjects,  and  a 
broad  content  spectrum  of  measurements  from  which  appropriate  composites 
could  be  developed,  normed,  and  then  used  as  benchmarks. 

At  about  this  time,  the  American  Institutes  for  Research  (AIR)  was 
starting  PROJECT  TALENT,  a  national  aptitude  cen3us  study.  Contractual 
arrangements  were  made  with  AIR  for  linkage  of  Air  Force  tests  to  the 
PROJECT  TALENT  data  base  such  that  a  composite  of  TALENT  measures  might 
be  developed  as  a  normative  reference  for  each  separate  Air  Force 
selection  and/or  classification  test  or  composite,  ^^he  study  through 
which  these  reference  composites  were  developed  is  detailed  in  an  Air 
Force  technical  report  by  Dailey,  Shaycroft,  and  Orr  (lybz;. 

The  TALENT  Battery  was  administered  to  approximately  3,300  basic 
airmen,  yielding  about  2,500  complete  cases,  ^^"atified  by  Armed  Forces 
Qualification  Test  (AFQT)  deciles  in  the  centile  range  21-100.    The  Air 
Force  provided  records  of  subtest  and  composite  scores  for  t^^e  AFQT, 
thelir  F^rce  Officer  Qualifying  Test  (AFOQT) ,  and  the  Airman  Qualifying 
Exam  (AQE)  for  each  airman  in  the  saii?)le. 

For  the  data  analysis,  the  total  sample  was  randomly  divided  into 
two  ilroXallly  equal  subsamples,  designated  Subsample  A  and  Subs  ample 
B     Much  of  the  data  analysis  was  done  separately  for  the  two  sub- 
samples • 

In  order  to  pick  the  best  predictive  composite  for  each  °f 
Force  variables,  multiple  regression  analyses  were  run  with  each  one  of 


the  Air  Force  variables,  in  turn,  as  criterion,  and  with  74  of  the 
TALENT  Battery  test  scores  as  predictors. 

On  the  basis  of  these  analyses,  sets  of  predictor  variables  were 
selected  from  TALENT  Battery  reference  composites  for  the  Air  Force 
criterion  variables.    One  restriction  on  this  selection  was  to  limit 
the  total  testing  time  for  the  tests  predicting  AFOQT  to  4  hours  and 
for  those  predicting  AQE  to  2  hours.    Prediction  weights  were  expressed 
as  integers,  roughly  proportional  to  the  Sample  A  and  Sample  B  average 
of  the  raw  score  regression  weights  obtained  by  a  stepwise  regression 
procedure.    Typically,  these  TALENT  based  composites  correlated  about 
.8  with  the  composites  they  were  designed  to  predict  on  cross- 
application  to  other  samples. 

This  study  allowed  for  estimation  of  the  distribution  of  18-year- 
old  performance  on  the  various  Air  Force  tests  and  provided  constant 
highly  correlated  reference  measures  for  norming  future  revisions  of 
the  Air  Force  tests.    These  reference  composites  were  used  for  a  ntimber 
of  vpsr"  ■^'^  a-jt-  Vrxirno  f-oQf  nnmrf ng  stiidies.    Air  Force  experience  with 
these  reference  composites  leads  to  a  few  recommendations  I  would  like 
to  pass  on  with  respect  to  an  appropriate  mobilization  base  study. 

(1)  The  test  battery  for  such  a  study  should  be  quite  broad  both 
in  content  and  difficulty  range.    In  our  experience,  the  TALENT  Battery 
was  quite  adequate  for  enlisted  reference  composites;  however,  for 
officer  test  norming  studies,  more  "top"  on  the  battery  would  have  been 
desirable.     It  would  seem  entirely  appropriate  to  define  the  mobiliza- 
tion base  in  terms  of  the  manpower  pool  available  both  for- enlisted  and 
officer  specialties;  thus,  measures  in  the  battery  should  accommodate  a 
broad  spectrum  of  ability.    Broad  content  coverage  is  needed  both  to 
permit  initial  development  of  highly  relevant  reference  composites  and 
to  permit  later  development  of  new  reference  composites  as  tests  and 
test  programs  change. 

(2)  The  study's  sampling  plan  should  include  adequately  large 
representation  of  the  potential  of ficer  j)ool .    One  product  of  the  study 
should  be  standards  against  which  officer  and  aircrew  tests  can  be 
normed.     It  would  be  desirable  to  code  data  on  participants  in  the  study 
such  that  specific  subpopulations  of  possible  service  interest  may  be 
identified  and  used  as  a  standards  reference. 

(3)  The  study  should  provide  a  standards  reference  data  base 
which  may  be  extended  as  service  tests  change  over  time.  While 
established  reference  composites  should  be  retained  in  the  data  base, 
specific  subtest  data  should  also  be  retained  in  an  easily  accessible 
form.    Whenever  new  service  tests  are  developed,  it  should  be  possible 
to  relate  the  new  test  to  the  subtests  of  the  mobilization  base 
battery,  to  develop  a  reference  composite  for  the  new  test,  and  to 
establish  reference  conversion  standards  in  the  mobilization  base  file. 
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Basically,  a  mobilization  survey  should  have  broad  content  cover- 
age, encompass  a  broad  ability  range,  and  should  be  easily  exercised  to 
form  highly  correlated  reference  standards  for  both  current  and  future 
service  tests. 
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Bie  Inpact  of  Valid  Selection  Procedures  on  Workforce  Productivity 

Abstract 

Ohis  study  reports  evidence  showing  that  the  inpact  of  valid  selection 
procedures  on  workforce  productivity  is  iruch  greater  than  personnel  psy- 
chologists have  typically  believed.    The  Brogden-Cronbach  decision  theoretic 
models  of  selection  utility  are  presented  and  explained.    The  three  major 
reasons  for  the  failure  of  personnel  psychologists  to  make  wide  use  of 
these  equations  are  presented  and  shewn  to  be  faulty.    Decision  theoretic 
equations  are  used  to  estimate  the  inpact  on  productivity  of  a  valid  test 
if  used  to  select  new  conputer  programers  for  one  year  in  (a)  The  Federal 
Government  and  (b)  the  national  economy.    The  test  analyzed  is  the  Pro- 
gramer  i^titude  Test  (PAT),  v*ich  previous  validity  generalization  research 
(Rosenberg,  Schmidt,  and  Hunter,  Note  1)  has  shown  to  have  substantial 
and  generalizable  validity.    A  newly  developed  technique  is  used  to  estimate 
SDy,  the  standard  deviation  of  the  dollar  value  of  enployee  job  performance, 
the  item  of  required  information  that  has  been  nost  difficult  and  expensive 
to  estimate  in  the  past.    Results  are  presented  separately  for  the  Federal 
Government  and  U.S.  econony.    For  both^  results  are  presented  for  different 
selection  ratios  and  for  different  assumed  values  for  the  validity  of 
previously  used  selection  procedures.    The  inpact  of  PAT  on  programer  pro- 
ductivity was  shown  to  be  substantial  for  all  combinations  of  assunptions. 
the  results  support  the  conclusion  that  hundreds  of  millions  of  dollars  in 
increased  productivity  could  be  realized  by  increasing  the  validity  of 
selection  decisions  in  this  occupation.    Likely  similarities  between  com- 
puter programers  and  other  occupations  are^lso  discussed. 
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The  Iitpact  of  Valid  Selection  Procedures  on  Workforce  Productivity 

Questions  concerning  the  economic  and  productivity  inplications  of  valid 
selection  procedures  have  come  increasingly  to  the  fore  in  industrial-organi- 
zational psychology.    The  recent  Annual  Review  of  Psychology  chapter  by 
Dunnette  and  Borman  (1979)  includes— for  the  first  time— a  separate  section 
on  the  utility  and  productivity  iirplications  of  selection  methods.  This 
development  is  due  at  least  in  part  to  the  eni*iasis  placed  on  the  practical 
utility  of  selection  procedures  in  recent  years  in  some  of  the  litigation 
involving  selection  tests.    Hunter  and  Schmidt  (1979)  have  contended,  on  the 
basis  of  a  review  of  the  enpirical  literature  on  the  economic  utility  of 
selection  procedures,  that  personnel  psychologists  have  typically  failed  to 
appreciate  the  magnitude  of  productivity  gains  that  result  from  use  of  valid 
selection  procedures.    The  major  purpose  of  this  study  is  to  illustrate  the 
productivity  (economic  utility)  inplications  of  a  valid  selection  procedure 
in  the  occupation  of  conputer  programer  in  the  Federal  Government  and  in 
the  econcmiy  as  a  whole. 

History  and  Development  of  Selection  Utility  Models 

The  evaluation  of  benefit  obtained  from  selection  devices  has  been  a 
problem  of  continuing  interest  in  industrial  psychology.    Most  attempts  to 
€;valuate  benefit  have  focused  on  the  validity  coefficient,  and  at  least 
five  approadies  to  the  interpretation  of  the  validity  coefficient  have  been 
advanced  over  the  years.    The  oldest  of  these  is  the  Index  of  Forecasting 
Efficiency,  symbolized  E.    E  =  1  -Vl  -        i  where  r^y  is  the  validity 
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coefficient.    This  index  conpares  the  standard  error  of  job  performance 
scores  predicted  by  means  of  the  test  (the  standard  error  of  estiinate) 
fee  fch§  fiife£Uidaifd  error  that  results  when  there  is  no  valid  information 
about  applicants  and  one  predicts  the  mean  level  of  performance  for 
everyone  (the  standard  deviation  of  job  performance).    The  index  of 
forecasting  efficiency  was  heavily  enphasized  in  early  texts  (Kelley, 
1923;  Hull,  1928)  as  the  appropriate  means  for  evaluating  the  value  of 
a  selection  procedure.    This  index  describes  a  test  correlating  .50  with 
job  performance  as  predicting  only  13%  better  than  chance,  a  very 
unrealistic  and  pessimistic  interpretation  of  the  economic  test's  value. 

The  index  of  forecasting  efficiency  was  succeeded  by  the  coefficient 
of  determination,  which  became  popular  during  the  1930' s  and  1940 's. 
The  coefficient  of  determination  is  sinply  the  square  of  the  validity 
coefficient  or  rxy^.    This  coefficient  was  refered  to  as  "the  propor- 
tion of  variance  in  the  job  performance  measure  accounted"  for  by  the 
test.    The  coefficient  of  determination  describes  a  test  of  validity  of 
.50  as  "accounting  for"  25%  of  the  variance  of  job  performance.  Although 
r^QT^  is  still  occasionally  referred  to  by  selection  psychologists — and 
has  surfaced  in  litigation  on  personnel  tests — the  "amount  of  variance 
accounted  for"  has  no  direct  relationship  to  productivity  gains  resulting 
from  use  of  selection  device. 

2 

Both  E  and         lead  to  the  conclusion  that  only  tests  with  relatively 
high  correlation  with  job  performance  will  have  significant  practical  value. 
Neither  of  these  interpretations  recognizes  that  the  value  of  a  test  varies 
as  a  function  of  the  parameters  of  the  situation  in  v*iich  it  is  used.  They 
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are  general  interpretations  of  the  correlation  coefficient  and  have  been 
shown  to  be  ineqpprqpriate  for  interpreting  the  validity  coefficient  in 
selection  x  Mrogden,  1946;  Cronbach  and  Gleser,  1965,  p.  31;  Curtis  and 
Alf,  1969). 

a!he  well-known  interpretation  developed  by  Taylor  and  Russell  (1939) 
goes  beyond  the  validity  coefficient  itself  and  takes  into  account  two 
properties  of  the  selection  problem — the  selection  ratio  (the  proportion 
of  25)plicants  hired)  and  the  base  rate  (the  percentage  of  applicants  vAio 
would  be  "successful"  withojt  use  of  the  test).    Itiis  model  yields  a  ituch 
more  realistic  interpretation  of  the  value  of  selection  devices.  The 
Taylor-Russell  model  indicates  that  even  a  test  with  a  modest  validity  can 
substantially  increase  the  percentage  vAio  are  successful  among  those 
selected  when  the  selection  ratio  is  low.    For  exanple,  when  the  base  rate 
is  .50  percent  and  the  selection  ratio  is  .10,  a  test  with  validity  of  only 
.25  will  increase  the  percentage  among  the  selectees  v^o  are  successful 
from  50  percent  to  67  percent,  a  gain  of  17  additional  successful  enployees 
per  100  hired.    Although  an  improvement,  the  Taylor-Russell  approach  to 
determining  selection  utility  does  have  disadvantages.    Foremost  among  them 
is  the  need  for  a  dichotomous  criterion.    Current  enployees  and  new  hires 
must  be  sorted  into  an  unrealistic  two  point  distribution  of  job  per- 
formance:   "successful"  and  "unsuccessftal"  (or  "satisfactory"  and  "unsatis- 
factory").   The  decision  as  to  v*iere  ti)  draw  the  iLne  to  create  the  dichtomy 
is  arbitrary.    But  more  important  than  this  is  the  fact  that  information  on 
levels  of  performance  within  each  group  is  lost  (Cronbach  &  Gleser,  1965, 
123-124,  138).    All  those  within  the  "successful"  group,  for  exanple,  are 
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implicitly  assumed  equal  in  value,  whether  they  perform  in  an  outstanding 
manner  or  barely  exceed  the  cut-off.    ihis  fact  makes  it  difficult  to 
express  utility  in  units  that  are  conparable  across  situations. 

The  next  major  advance  was  left  to  Brogden  (1949),  who  used  the 
principles  of  linear  regression  to  demonstrate  how  the  selection  ratio 
(SR)  and  the  standard  deviation  of  job  performance  in  dollars  (SD^)  affect 
the  economic  utility  of  a  selection  device.    Despite  the  fact  that  Brogden' 
derivations  are  a  landmark  in  the  development  of  selection  utility  models, 
they  are  very  straightforward  and  siiiple  to  understand. 

Let        =  the  correlation  between  the  test  (x)  and  job  performance 
measured  in  dollar  value.    The  basic  linear  model  is: 

^  =         +  ^^y  +  e 
Where: 

Y  =  job  performance  measured  in  dollar  value; 

e  =  the  linear  regression  weight  on  test  scores  for  predicting  job 
performance; 

ly,  =  test  performance  in  standard  score  form  in  the  applicant  grn.ip. 
My  =  mean  job  performance  (in  dollars)  of  randomly  selected 

enployees;  and 
e  =  error  of  prediction. 
This  equation  applies  to  the  job  performance  of  an  individual.    The  equation 
which  gives  the  averaae  job  performance  for  the  selected  (s)  groups    (or  for 
any  other  subgroup)  is: 

E(Ys)  =  E(6Zx^)  +  Eipy)  +  E(e) 

Since  E(e)  =  0,  and  6  and  y  are  constants,  this  becomes: 

7-0.-- 
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This  equation  can  be  further  sinplified  by  noting  that  B  =  r^y  (SD^/SDj^)  i^ere 
SDy  is  the  standard  deviation  of  job  performance  measured  in  dollar  value  among 
randomly  selected  enployees.    Since  SD^^     1.00,  3     r^D^.    We  thus  obtain: 

Ttiis  equaticm  gives  the  absolute  dollar  value  of  average  job  performance  in 
the  selected  group.    What  is  needed  is  an  equation  which  gives  the  increase 
in  dollar  value  of  average  performance  that  results  from  using  the  test. 
Uote  that  if  the  test  were  not  used,  Yg  would  be  iJy.    lhat  is,  mean  per- 
formance  in  the  selected  group  is  the  same  as  mean  performance  in  a  group 
selected  randomly  from  the  applicant  pool.    Ihus  the.  increase  due  to  use  of 
a  valid  test  is  ^xj'  SEy  2"^^.    The  equation  we  want  is  produced  by  transposing 
Vy  to  give: 

The  value  on  the  right  in  the  above  equation  is  the  difference  between  mean 
productivity  in  the  group  selected  using  the  test  and  mean  productivity  in 
a  group  selected  without  using  the  test,  that  is,  a  group  selected  randomly. 
The  above  equation  thus  gives  mean  gain  in  productivity  per  selectee  resulting 
from  use  of  the  test,  i.e., 

AU/selectee  =  r^D^^ 
where  U  is  utility  and  AU  is  marginal  utility. 

Equation  (1)  states  that  the  average  productivity  gain  in  dollars  per 
person  hired  is  the  product  of  the  validity  coefficient,  the  average  standard 
score  on  the  test  of  those  hired,  and  the  SD  of  job  performance  in  dollars. 
Hie  value  ^ocy^Cg  mean  standard  score  on  the  dollar  criterion  of  those 
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selected,  Zy.    Thus  utility  per  selectee  is  the  mean  Z-score  on  the  criterion 
of  those  selected  times  the  standard  deviation  of  the  criterion  in  dollars. 
The  only  assunption  that  Equation  (1)  makes  is  that  the  relation  between 
the  test  and  job  performance  is  linear.    If  we  further  assume  that  the  test 
scores  are  normally  distributed,  the  mean  test  score  of  those  selected  is 
(()/p ,  vrtiere: 

P  =  the  selection  ratio,  and 

♦  =  the  ordinate  in  N(0,1)  at  the  point  of  cut  corresponding  to  p. 

Thus  equation  [1]  can  be  written: 

AU/selectee  -  r^y  (f/p  SDy  (2) 
The  above  equations  illustrate  the  critical  role  of  SDy  and  suggests 

the  possibility  of  situations  in  which  tests  of  low  validity  have  higher  ^ 

utility  than  tests  of  high  validity.    For  exairple: 


siv 

AU/sslectee 

.20 

1.00 

25,000 

$5,000 

.60 

1.00 

2,000 

1,200 

Mid-level  job  (e.g.,  systems  analyst) 
Lower  level  job  (e.g.,  janitor) 

The  total  utility  of  the  test  depends  on  the  number  of  persons  hired. 
The  total  utility  (total  productivity)  gain  resulting  from  use  of  the  test  is 
sinply  the  mean  gain  per  selectee  times  the  number  of  people  selected,  Ng. 
That  is,  the  total  productivity  gain  is: 

AU  -  Ns        SDy  I^. 
In  this  exanple,  the  average,  marginal  utilities  are  $5000  and  $1200.    If  10 
people  were  hired  the  actual  utilities  would  be  $50,000  and  $12,000  respec- 
tively.   If  1000  people  were  to  be  hired,  then  the  utilities  would  be  ^ 
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$500,000  and  $120,000  respectively.    Obviously  the  total  dollar  value  of 
tests  is  greater  for  large  enployers  than  for  sandl  enplpyers.  However, 
this  fact  can  be  misleading:    on  a  percentage  basis  it  is  average  gain 
In  utility  that  counts;  and  that's  what  counts  to  each  individual 
enployer. 

Equations  (1)  and  (2)  clearly  illustrate  tlie  basis  for  Brogden's  (1946) 
conclusion  that  the  validity  coefficient  itself  is  a  direct  index  of  selec- 
tive efficiency.    Brogden  (1946)  showed  that;  given  only  the  assumption  of 
linearity,  the  validity  coefficient  is  the  proportion  of  maximum  utility 
attained,  where  maximum  utility  is  the  productivity  gain  that  would  result 
from  a  perfectly  valid  test.    A  test  with  a  validity  of  .50,  for  exanple, 
can  be  expected  to  produce  50%  of  the  gain  that  would  result  from  a  perfect 
(validity  =  1.00)  selection  device  used  in  the  same  setting  and  at  the 
same  selection  ratio.    A  glance  at  Equation  (i)  or  Equation  (2)  verifies 
this  verbal  statement.    Since  the  validity  coefficient  enters  the  equation 
as  a  nultiplicative  factor,  increasing  or  decreasing  the  validity  by  any 
factor  will  increase  or  decrease  the  utility  by  the  same  factor.  For 
example,  if  we  increase  validity  by  a  factor  of  two  by  raising  it  from 
.20  to  .40,  equation  (2)  shows  that  utility  doubles,    if  we  decrease 
validity  by  a  factor  of  one-half  by  lowering  it  from  1.00  to  .50,  utility 
is  cut  in  half.    Equations  (1)  to  (2)  also  illustrate  the  fact  that  there 
are  limitations  on  the  utility  of  even  a  perfectly  valid  selection  device. 
If  the  selection  ratio  is  very  high,  the  term  ^/p  (or  Z^)  approaches  zero 
and  even  a  perfect  test  has  little  value.    If  the  selection  ratio  is  1.00, 
the  perfect  test  has  no  value  at  all.    Likewise,  as  SD^  decreases,  the 
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utility  of  even  a  perfect  test  decreases.  In  a  hypothetical  world  in 
\rfiich  SDy  were  zero,  even  a  perfect  test  would  have  no  value. 

Brogden  (1946)  further  showed  that  the  validity  coefficient  could 
be  expressed  as  the  following  ratio: 

^(y)  "  ^(r) 


%^ere: 

z!./  \  =  the  mean  job  performance  (y)  standard  scorf^  for  those 
selected  using  the  test  (x). 

%,i  \  -  the  mean  job  perfonnance  standard  score  resulting  if 
selection  were  on  the  criterion  itself,  at  the  same 
selection  ratio. 

=  the  mean  job  performance  standard  score  resulting  if 
selection  decisions  were  made  randomly  (from  among  the 
otherwise  screened  pool  of  applicants). 

=  the  validity  coefficient. 

Since  Zy{j^)  =  0  by  definition,  the  above  formula  reduces  to  Zy(x)^^(y)* 

Ihis  formulation  has  inplications  for  the  development  of  new  methods  of 

estimating  selection  procedure  validity.    If  reasonably  accurate  estimates 

of  both  Zy(y)  and  ^(y)  can  be  obtained,  validity  can  be  estimated  without 

conducting  a  traditional  validity  .stu*y.    Further,  est:.:rates  produced  by  a 

procedure  of  this  kind  would  be  unaffected  by  range  restriction  and  criterion 

unreliability. 

In  Equations  (1)  and  (2),  the  values  for        and  SDy  should  be  those 
irfiich  would  hold  if  applicants  were  hired  randomly  with  respect  to  test 
scores.    That  is,  they  should  be  values  applicable  to  the  applicant  popula- 
tion, the  group  in  which  the  selection  procedure  is  actually  used.  Values 
of  r^^  and  SDy  canputed  on  incumbents  will  typically  be  under  est  iinates 
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because  of  reduced  variance  among  incumbents  on  both  test  and  job  performance 
measures.    Values  of        conputed  on  incumbents  can  be  corrected  for  range 
restriction  to  produce  estimates  of  the  value  in  the  applicant  pool 
(Thorndike,  1949,  169-176)  •    Ohe  applicant  pool  is  made  up  of  all  v*io  have 
survived  screening  on  any  prior  selection  hurdles  than  might  be  enplcyed, 
e.g^z  minimum  educational  requirements,  f^iysical  examinations,  etc. 

The  correlation  between  the  test  and  a  well  developed  measure  of  job 
performance  (y")  provides  a  good  estimate  of  r^,  the  correlation  of  the 
test  with  job  performance  measured  in  dollars  (productivity).    It  is  a 
safe  assunption  that  job  performance  and  the  value  of  that  performance  are 
at  least  monotonically  related.    It  is  inconceivable  that  lower  performance 
could  have  greater  dollar  value  than  higher  performance.    Ordinarily,  the 
relation  between  y*  and  y  will  be  not  only  monotonic  but  also  linear.  If 
there  are  departures  from  linearity,  the  departures  will  typically  be 
produced  by  leniency  in  job  performance  ratings  which  lead  to  ceiling 
effects  in  the  measuring  instrument.    The  net  effect  of  such  ceiling 
effects  is  to  make  the  test's  correlation  with  the  measure  of  job  per- 
formance smaller  than  its  correlation  with  actual  performance,  that  is, 
smaller  than  its  true  value,  making  r^**  an  underestimate  of  r^^.  An 
alternative  statement  of  this  effect  is  that  ceiling  effects  due  to 
leniency  produce  an  artificial  nonlinear  relation  between  job  performance 
ratings  and  the  actual  dollar  value  of  performance.    A  nonlinear  relation 
of  this  form  would  lead  to  an  underestimation  of  selection  utility  because 
the  performance  measure  underestiinates  the  relative  value  of  very  high 
performers.    Values  of  r^-'  should  also  be  corrected  for  attenuation  due 
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to  errors  of  measurement  in  the  criterion.    Random  error  in  the  observed 
measure  of  job  performance  causes  the  test's  correlation  with  that  measure 
to  be  lower  than  its  correlation  with  actual  job  performance.    Since  it  is 
the  correlation  with  actual  performance  that  determines  test  utility,  it 
is  the  attenuation-corrected  estimate  that  is  needed  in  the  utility  formulas. 
This  estimate  is  simply  ry^/^y^^    v*ere  ^y'^y'"  is  the  reliability  of  the 
performance  measure.    (See  Schmidt,  Hunter,  &  Urry,  1976,  for  further  dis- 
cussion of  these  points. ) 

Itie  next  major  advance  in  this  area  came  in  the  form  of  the  monumental 
work  by  Cronbach  and  Gleser,  Psychological  Tests  and  Personnel  Decisions. 
First  published  in  1954,  this  work  was  republished  in  1965  in  augmented 
form.    Itie  book  consists  of  detailed  and  sofiiisticated  application  of 
decision  theory  principles  not  only  to  the  single-stage  fixed-job  selection 
decisions  which  we  have  thus  far  discussed,  but  also  to  placement  and 
classification  decisions  and  sequential  selection  strategies.    In  these 
latter  areas,  many  of  their  derivations  were  indeed  new  to  the  field  of 
personnel  testing.    Itieir  formulas  for  utility  in  the  traditional  selec- 
tion setting,  however,  turn  out  upon  examination  to  be  identical  to  those 
of  Brogden  (1949),  except  for  the  fact  that  they  formally  incorporate  cost 
of  testing  ("information  gathering")  into  the  equations. 

Brogden,  it  will  be  recalled,  approached  the  problem  from  the  point  of 
view  of  mean  gain  in  utility  per  selectee.    Cronbach  and  Gleser  (1965,  chapte; 
4)  derived  their  initial  equation  in  terms  of  mean  gain  per  applicant.  Their 
initial  formula  was  (ignoring  cost  of  testing  for  the  moment): 

AU/applicant  =  r^^^  SD^tJ) 
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All  terms  are  as  defined  earlier.    Mulitplying  ty  the  nianber  of  applicants, 
yields  total  or  overall  gain  in  utility.    Hie  Brogden  fornula  for  over- 
all utility  is: 

AU  =  Ng  AU/selectee  =  Ng        SDy  <{./p  (3) 
Ns,  it  will  be  recalled,  is  the  number  selected.    If  we  note  that 
P  «Ns/H,  i.e.,  the  ratio  of  selectees  to  applicants,  we  find  that  Brogden's 
equation  Lumediately  reduces  to  the  Cronbach-Gleser  (1965)  equation  for  total 
utility: 

Role  of  the  Cost  of  Testing 

The  previous  section  ignored  the  cost  of  testing,  which  is  quite  reason- 
able in  most  testing  situations.    For  exanple,  in  a  typical  job  situation,  the 
applicant  pool  consists  of  people  who  walk  through  the  door  and  ask  for  a  job 
(i.e.,  there  are  no  recruiting  costs).    Hiring  is  then  done  on  the  basis  of 
an  application  blank  and  a  test  which  are  administered  by  a  trained  clerical 
worker  at  a  cost  of  10  dollars  oi  so.    If  the  selection  ratio  is  10%,  then 
the  cost  of  testing  per  person  hired  is  10  dollars  for  each  person  hired  and 
90  dollars  for  the  nine  persons  rejected  in  finding  the  person  hired,  or  100 
dollars  altogether.    This  is  negligible  in  relation  to  the  usual  magnitude 
of  utility  gains.    Furthermore,  this  100  dcllars  is  a  one  time  cost  whereas 
utility  gains  continue  to  accumulate  over  as  many  years  as  the  person  hired 
stays  with  the  organization.    When  cost  of  testing  is  included.  Equation  (2) 
becomes: 

AU/selfictee  =  r^y  SEy  <|>/p  -  C/p  (4) 
v*)ere  C  is  the  cost  of  testing  one  applicant. 
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Although  cost  of  testing  typically  has  only  a  trivial  inpact  on  selec- 
tion utility,  it  is  possible  to  conjure  up  hypothetical  situations  in  which 
cost  plays  a  critical  role.    For  exanple,  suppose  an  enployee  were  recruiting 
one  individual  for  a  sales  position  that  would  last  only  one  year.  Suppose 
further  that  the  eirployers  decide  to  base  their  selection  on  the  results  of 
an  assessment  center  which  costs  $1000  per  assesses  and  has  a  true  validity 
of  .40.    If  the  yearly  value  of  SDy  for  this  job  is  $10,000,  and  10  candi- 
dates are  assessed,  the  expected  gain  in  productivity  is  .4($10, 000) (1.758) 
or  $7034.    However,  the  cost  of  the  assessment  center  is  10(1000)  =  $10,000, 
\*ich  is  $2966  greater  than  the  expected  productivity  gain.    That  is,  under 
these  conditions  it  would  cost  more  to  test  10  persons  than  would  be  gained 
in  ijiproved  performance,    if  the  employer  tested  only  five  candidates,  then 
the  e5q)ected  gain  in  performance  would  be  5607  dollars  while  the  cost  of 
testing  would  be  $5000  for  an  expected  gain  of  607  dollars.    In  this  situa- 
tion, the  optimal  number  to  test  is  three  persons.    The  best  person  of 
three  would  have  an  expected  gain  in- performance  of  $4469  with  a  cost  of 
testing  of  3000  dollars,  for  an  expected  utility  of  1469  dollars. 

Relation  Between  SR  and  Utility 

In  most  situations,  the  number  to  be  hired  is  fixed  by  organizational 
needs.    If  the  applicant  pool  is  also  fixed,  the  question  of  vrfiich  SR  would 
yield  naximum  utility  becomes  academic.    The  SR  is  determined  by  circum- 
stances and  is  not  under  the  control  of  the  enployer.    However,  enployers 
can  often  exert  some  control  over  the  size  of  the  applicant  pool  by  in- 
creasing or  decreasing  recruiting  efforts.    If  this  is  the  case,  the 
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question  is  then  how  many  applicants  the  enplcyer  should  test  to  obtain 
the  needed  number  of  new  enployees  in  order  to  maximize  productivity  gains 
from  selection.    Biis  question  can  be  answered  using  a  formula  given  by 
Cronbach  and  Gleser  (1965,  p.  309): 

where       is  the  cutting  score  on  the  test  in  Z  score  form.    This  equation 
must  be  solved  by  iteration.    Only  one  value  of  the  SR  (i.e.,  p)  will  satisfy 
this  equation  and  p  will  always  be  less  than  or  equal  to  .50.    The  value 
confuted  for  the  optimal  SR  indicates  the  number  that  should  be  tested  in 
relation  to  the  number  to  be  selected.    For  exanple,  if  the  number  to  be 
selected  is  ICO  and  Equation  (3)  indicates  that  the  optimal  SR  is  .05,  the 
enployer  will  maximize  selection  utility  by  recruiting  and  testing  2000 
candidates  (100/.05  =  2000).    Ihe  cost  of  recruiting  additional  applicants 
beyond  those  available  without  recruitment  efforts  must  be  incorporated  into 
the  cost  of  testing  term,  C.    C  then  becomes  the  average  cost  of  recruiting 
and  testing  one  applicant.    The  lower  the  cost  of  testing  and  recruiting,  the 
larger  the  number  of  applicants  it  is  profitable  to  test  in  selecting  a  given 
number  of  new  erplpyees.    Since  the  cost  of  testing  is  typically  quite  low 
relative  to  productivity  gains  from  selection,  the  nunber  tested  should 
typically  be  large  relative  to  the  number  selected. 

In  situations  in  v^ich  the  applicant  pool  is  constant,  statements 
about  optimal  SR's  typically  do  not  have  practical  value,  since  the  SR  is 
not  under  the  control  of  the  enployer.    Given  a  fixed  applicant  pool, 
AU/selectee  increases  as  SR  ratio  decreases  if  cost  of  testing  is  not  con- 
sidered.   Brogden  (1949)  showed  that,  when  cost  of  testing  is  taken  into 
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account  and  when  this  cost  is  unusually  high,  AU/selectee  will  be  less  at 
very  low  SR*s  than  at  somewhat  high  SR's.    If  cost  of  test:  g  per  ^pli- 
cant  is  very  high,  cost  of  testing  per  selectee  can  become  greater  at 
extremely  low  SR's  than  AU/selectee,  producing  a  loss  rather  than  a  gain 
in  Uuiliuy*    In  practiCGf  iiuvvuvcLf  uiie  ^vmiuxiiauxwu  Ox  eAuxeuicxy  nx^i  u^ouxaa^ 
costs  and  extrenely  low  SR's  that  could  lead  to  negative  utilities  occurs 
rarely,  if  ever.    When  the  applicant  pool  is  fixed,  the  SR  that  is  optimal 
for  AU/selectee  is  not  necessarily  the  optimal  SR  for  total  gain  in  utility. 
Cronbach  and  Gleser  showed  that  total  utility  is  always  greatest  when  the 
SR  falls  at  .SO.    As  SR  decreases  from  .50,  AU/selectee  increases  until  it 
reaches  its  maxinum,  the  location  of  which  depends  on  the  cost  of  testing. 
But  as  AU/selectee  increases,  the  number  of  selectees,  Ngr  is  decreasing, 
and  the  product  NsA  U/selectee  or  total  utility  is  also  decreasing.    In  a 
fixed  applicant  pool,  total  gain  is  always  greatest  when  50  percent  are 
selected  and  50  percent  are  rejected  (Cronbach  and  Gleser,  1965,  pp.  38-40). 

Reasons  for  Failure  to  Eirploy  Selection  Utility  Models 

Despite  the  availability  since  1949  of  the  utility  equations  discussed 
above,  applied  differential  psychologists  have  been  notably  slow  in  carrying 
out  decision-theoretic  utility  analyses  of  selection  procedures.    In  our 
judgement,  the  sparcity  of  work  in  this  area  is  primarily  traceable  to  three 
facts.    First,  many  psychologists  believe  that  the  utility  equations  presented 
above  are  of  no  value  unless  the  data  exactly  fit  the  linear  homoscedastic 
model  and  all  marginal  distributions  are  normal.    Many  reject  the  model  in 
the  belief  that  their  data  do  not  perfectly  meet  the  assunptions. 
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Second,  psychologists  once  believed  that  validity  is  situationally 
specific,  that  there  are  subtle  differences  in  the  performance  requirements 
of  jobr  from  situation  to  situation  that  produce  (nontrivial)  differences  in 
test  validities,    if  this  were  true,  then  the  results  of  a  utility  analysis 
conducted  in  a  given  setting  could  not  be  generalized  to  apparently  identical 
test-job  conbinations  in  new  settiigs.    Combined  with  the  belief  that  utility 
analyses  nust  include  costly  cost  accounting  applications,  it  is  easy  to  see 
why  belief  in  situational  specificity  of  test  validities  would  lead  to  reluc- 
tance to  carry  out  utility  analyses. 

Third,  it  has  been  extremely  difficult  in  most  cases  to  obtain  all  the 
information  called  for  by  the  equations.    The  SR  and  cost  of  testing  can  be 
determined  reasonably  accurately  and  at  relatively  little  expense.  The 
item  of  information  that  has  been  most  difficult  to  obtain  is  the  needed 
estimate  of  SD^  (Cronbach  &  Gleser,  1965,  p.  121).    it  has  generally  been 
assumed  that  SD^  can  be  estimated  only  by  the  use  of  costly  and  conplicated 
cost  accounting  methods.    These  procedures  involve  first  costing  out  the 
dollar  value  of  the  job  behaviors  of  each  enployee  (Brogden  &  Taylor,  1950) 
and  then  computing  the  standard  deviation  of  these  values.    In  an  earlier 
review  (Hunter  &  Schmidt,  1978),  we  were  able  to  locate  only  two  studies  in 
which  cost  accounting  procedures  were  used  to  estimate  SD^.    In  this  study, 
we  will  present  an  alternative  to  cost  accounting  estimate-:  of  SEy. 
Are  the  Statistical  Assumptions  Met? 

The  linear  homoscedastic  model  includes  three  assunptions: 

1.  Linearity. 

2.  Equality  of  variances  of  conditional  distributions. 

3.  Normality  of  conditional  distributions. 
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As  we  have  shown  above,  the  basic  selection  utility  equation  [Equation  (1)] 
depends  only  on  linearity.    Equation  (2)  does  assume  normality  of  the  test 
score  distribution.    However,  Brogden  (1949)  and  Cronbach  and  Gleser  (1965) 
introduced  this  assurrption  essentially  for  derivational  convenience:  it 
provides  an  exact  relation  between  the  SR  and  z^^^.    One  need  not  use  the 
normality-based  ''elation  <i>/p  =  Z^^  to  conpute  Z^.    The  value  of  Z^  can 
be  coinxited  directly.    Thus  in  the  final  analysis,  linearity  is  the  only 
required  assurrption. 

To  vAiat  extent  does  data  in  differential  psychology  fit  the  linear 
horooscedastic  irxxJel?   To  answer  this  question,  we  rrust  of  necessity 
examine  sarrple  rather  than  population  data.    However,  it  is  only  conditions 
in  populations  that  are  of  interest;  sanple  data  is  of  interest  only  as  a 
means  of  inferring  the  state  of  nature  in  populations.    Obviously,  the 
larger  the  sairple  used,  the  more  clearly  the  situation  in  the  sairple  will 
reflect  that  in  the  population,  given  that  the  sanple  is  random.    A  number 
of  researchers  have  addressed  themselves  to  this  problem. 

Sevier  (1957),  using  N's  from  105  to  250,  tested  the  assurrptions  of 
linearity,  normality  of  conditional  criterion  distributions,  and  equality 
of  conditional  variances.    The  data  were  from  an  education  study,  with 
cumulative  grade  point  average  being  the  criterion  and  high  school  class 
rank  and  various  test  scores  being  the  predictors.    Out  of  24  tests  of 
the  linearity  assunption,  only  one  showed  a  departure  significant  at  the 
.05  level.    Out  of  8  sarrples  tested  for  equality  of  conditional  variances, 
only  one  showed  a  departure  significant  at  the  .05  level.    However,  25 
of  the  60  tests  for  normality  of  the  conditional  criterion  distributions 
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were  significant  at  the  .05  level.    Violation  of  this  assunpticai  throws 
Interpretations  of  cxjnditional  standard  deviations  based  on  nornal  curve 
tables  into  some  doubt.    However,  this  statistic  is  typically  not  used 
in  practical  prediction  situations,  such  as  selectioi  or  placement. 
Sevier's  study  indicates  that  the  assunptions  of  linearity  and  equality 
of  conditional  variances  may  be  generally  tenable. 

caiiselli  and  Kahneman  (1962)  examined  60  aptitude  varibles  on  one 
sanple  of  200  cases  and  reported  that  fully  40  percent  of  the  variables 
departed  significantly  from  the  linear  homoscedastic  nodel.    Ninety  per- 
cent of  these  departures  were  reported  to  have  held  up  on  cross-validation, 
•njpes  (1964)  re-analyzed  the  Ghiselli  and  Kahneman  data  and  found  that  only 
20  percent  of  the  relationships  departed  from  tiie  linear  honoscedastic 
model  at  the  .05  level.    He  also  found  that  three  of  the  "significant" 
departures  from  linearity  were  probably  due  to  typografiiical  or  clerical 
errors  in  the  data.    Later  Ghiselli  (1964)  accepted  and  agreed  with  TUpes' 
re-analysis  of  his  data.    Tupes'  findings  nust  be  interpreted  in  light  of 
the  fact  that  the  frequency  of  departure  from  the  linear  homoscedastic  model 
expected  at  the  .05  level  is  in  fact  nuch  greater  than  5%.    Tupes  carried 
out  two  statistical  tests  on  each  test-criterion  relation:    one  tor  line- 
arity and  one  for  equality  of  conditional  variances.    Bius  the  expected 
proportion  of  data  sanples  in  v^ich  at  least  one  test  is  significant  is 
not  .05  but  rather  a  little  over  .09.    If  three  statistical  tests  are 
run  at  the  .05  level— one  for  linearity,  one  for  normality  of  ccmditional 
distributions,  and  one  for  homogeneity  of  conditional  distributions,  the 
expected  proportion  of  data  sanples  in  vrtiich  at  least  one  of  these  tests 
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is  significant  is  approximately  .14  when  relations  in  the  parent  popula- 
tions are  perfectly  linear  and  homoscedastic. 

Tiffin  and  Vincent  (1960)  found  no  significant  departures  from  the 
bivariate  normal  model  in  15  independent  sanples  of  test-criterion  data, 
ranging  in  size  from  14  to  157.    In  each  set  of  data,  a  chi  square  test 
was  used  to  coirpare  the  percent  of  enployees  in  the  "successful"  job 
performance  category  in  each  fifth  of  the  test  score  distribution  to 
the  percentages  pr^icted  from  the  normal  bivariate  surface  (which  incor- 
porates the  linear  homoscedastic  model)  corresponding  to  the  conputed 
validity  coefficient.    Surgent  (1947)  performed  a  similar  analysis  on 
s.ir*''l',r  data  and  reported  t:h^?  same  findings. 

Hawk  (1970)  reported  a  major  study  researching  for  departures  from 
linearity.    The  data  were  drawn  from  367  studies  conducted  on  the  General 
/^titude  Test  Battery  (GATE),  used  fcy  the  U.S.  Department  of  Labor,  between 
1950  and  1966.    A  total  of  3303  relations,  based  on  23^428  individuals, 
between  the  nine  subtests  of  the  GATE  and  measures  of  job  performance 
(typically  supervisory  ratings)  were  examined.    The  frequency  of  departures 
from  linearity  significant  at  the  .05  level  was  .054.    Using  the  .01 
level,  the  frequency  was  .012.    Frequencies  closer  to  the  chance  level  can 
hardly  be  imagined. 

Brogden,  during  his  years  as  technical  director  of  what  is  now  the 
Army  Research  Institute  for  the  Behavior  and  Social  Sciences,  spent  a  con- 
siderable amount  of  time  and  effort  atteirpting  to  identify  nonlinear  test- 
criterion  relationships  in  large  sanples  of  military  selection  data. 
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Although  quadratic  and  other  higher  order  nonlinear  equations  sometimes 
provided  iirpressive  fits  to  the  data  in  the  initial  sample,  not  one  of 
the  equations  cross-validated  successfully  in  a  new  saitple  from  the  sane 
population.    In  cross-validation  sanples,  the  nonlinear  functions  were 
never  superior  to  siitple  linear  functions  (Brogden,  Note  1). 

These  findings,  taken  in  toto,  indicate  that  the  linear  homoscedastic 
model  generally  fits  the  data  in  this  area  quite  well.    Uie  linearity 
assumption,  the  only  truly  critical  assurrption,  is  particularly  well 
supported. 

We  turn  new  to  the  question  of  normality  of  marginal  distributions. 
In  certain  forms  (see  Equation  2),  the  Brogden-Cronbach  utility  formulas 
assume,  in  addition  to  linearity,  a  normal  distribution  for  predictor 
(test)  scores.    Hie  Taylor-Russell  tables,  based  on  the  assunption  of 
a  normal  bivariate  surface,  assume  normality  of  total  test  score  distri- 
bution also.    One  obviously  relevant  question  is  whether  or  not  viola- 
tions of  this  assumption  seriously  distort  utility  estimates.    Van  Naersson 
(in  Cronbach  &  Gleser,  1965)  found  that  they  do  not.    He  derived  a  set  of 
utility  equations  parallel  to  the  Brogden-Cronbach  equations  except  that 
they  were  based  on  the  assunption  of  a  rectangular  distribution  of  test 
scores.    He  found  that  v*ien  applied  to  the  same  set  of  oipirical  data, 
the  two  kinds  of  equation  produced  very  similar  utility  estimates  (p.  288). 
Cronbach  and  Gleser  (1965,  p.  160)  point  out  that  this  finding  "makes  it 
possible  to  generalize  over  the  considerable  variety  of  distributions 
intermediate  between  normal  and  rectangular."    Results  frcMn  the  Schmidt 
and  Hoffiman  (1973)  study  suggest  the  same  conclusion.    In  their  data 
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neither  the  predictor  nor  the  criterion  scores  appeared  to  be  normally 
distributed.    Yet  the  utility  estimates  produced  hy  the  Taylor-Russell 
tables  were  only  off  marginally:    4.09  percent  at  SR  =  .30  and  11.29 
percent  at  SR  =  .50. 

Bius  it  appears  that  an  obsessive  concern  with  statistical  assumptions 
is  not  justified.    This  is  especially  true  in  light  of  the  fact  that  for 
most  purposes,  there  is  no  need  for  utility  estimates  to  be  accurate  down 
to  the  last  dollar.    Approximations  are  usually  quite  adequate  for  the  kinds 
of  decisions  that  these  estijnates  are  used  to  make  (Van  Naersson,  1963, 
p.  282;  cf.  also  Cronbach  &  Gleser,  1965,  139).    Alternatives  to  use  of  the 
utility  equations  will  typically  be  procedures  which  produce  larger  errors, 
or  even  worse,  no  utility  analyses  at  all.    Faced  with  these  alternatives 
errors  in  the  5-10  percent  range  appear  negligible.    Further,  if  overesti- 
mation  of  utility  is  considered  more  serious  than  underestimation,  one  can 
always  employ  conservative  estimates  of  equation  parameters  (e.g.,  r^^y,  SD^) 
to  virtually  guarantee  against  overestimation  of  utilities. 
Are  Test  Validities  Situationally  Specific? 

The  third  reason  we  postulated  for  the  failure  of  personnel  psychologists 
to  exploit  the  Brogden-Cronbach  utility  models  was  belief  in  the  doctrine  of 
situational  specificity  of  validity  coefficients.    This  belief  precludes 
generalization  of  validities  from  one  setting  to  another,  making  criterion- 
related  validity  studies-and  utility  analyses-necessary  in  each  situation. 
The  empirical  basis  for  the  principle  of  situational  specificity  has  been 
the  fact  that  considerable  variability  in  observed  validity  coefficients  is 
typically  apparent  from  study  to  study  even  when  jobs  and  tests  appear  to 
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be  sindlaj  or  esser:C>iily  identical  (Ghiselli,  1966).    However,  there  are 
a  priori  grounds  for  postulating  that  this  variance  is  due  to  statistical, 
measurement,  and  other  artifacts  unrelated  to  the  underlying  relation  between 
test  and  job  performance.    There  are  at  least  seven  such  sources  6f  artifactual 
variance: 

1.  Differences  between  studies  in  criterion  reliability. 

2.  Differences  between  studies  in  test  reliability. 

3.  Differences  between  studies  in  range  restriction. 

4.  Sanpling  error  (i.e.,  variance  due  to  N  <  «). 

5.  Differences  between  studies  in  amount  and  kind  of 
criterion  contamination  and  deficiency  (Brogden  & 
Taylor,  1950). 

6.  Corputational  and  typographical  errors  (Wolins,  1962). 

7.  Slight  differences  in  factor  structure  between  tests 
of  a  given  type  (e.g.,  arithmetic  reasoning  tests). 

In  a  purely  analytical  substudy,  Schmidt  et  al .  (in  press)  showed  that  the 
first  four  sources  alone  are  capable,  under  specified  and  realistic  circum- 
stances, of  producing  as  much  variation  in  validities  as  is  typically 
observed  from  study  to  study.    They  then  turned  to  analyses  of  enpirical 
data.    Using  14  distributions  of  validity  coefficients  from  the  published 
and  unpublished  literature  for  various  tests  in  the  occupations  of  clerical 
worker  and  first-line  supervisor,  they  found  that  artifactual  variance 
sources  (1)  through  (4)  accounted  for  an  average  of  62  percent  of  the 
variance  in  validity  coefficients,  with  a  range  from  43  percent  to  87 
percent.    Thus  there  was  little  remaining  variance  in  which  situational 
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moderators  could  operate.    In  an  earlier  study  (Schmidt  &  Hunter,  (1977), 
it  was  found  that  sources  (1),  (3)  and  (4)  alone  accounted  for  an  average 
of  about  50  percent  of  the  observed  variance  in  distributions  of  validity 
coefficients  presented  by  Ghiselli  (1966,  p.  29).    If  one  could  correct 
for  all  seven  sources  of  error  variance,  one  would,  in  all  likelihood, 
consistently  find  that  the  remaining  variance  was  zero  or  near  zero. 
Ihat  is,  it  is  likely  that  the  small  amounts  of  remaining  variance  in 
the  studies  cited  here  are  due  to  the  sources  of  artif actual  variance 
not  corrected  for.    Thus  there  is  now  strong  evidence  that  the  observed 
variation  in  validities  from  study  to  study  for  similar  test-job  combina- 
tions is  artifactual  in  nature.    These  findings  cast  considerable  doubt 
on  the  situational  specificity  hypothesis. 

Rejection  of  the  situational  specificity  doctrine  obviously  opens  the 
way  to  validity  generalization.    However,  validity  generalization  is 
possible  in  imny  cases  even  if  the  situational  specificity  hypothesis  can- 
not be  definitively  rejected.    After  correcting  the  mean  and  variance  of 
the  validity  distribution  for  sanpling  error,  for  attenuation  due  to 
criterion  unreliability,  and  for  range  restriction  (based  on  average 
values  of  both),  one  may  find  that  a  large  percentage,  say  90  percent, 
of  all  values  in  the  distribution  lie  above  the  minimum  useful  level 
of  validity.    In  such  a  case,  one  can  conclude  with  90%  confidence 
that  true  validity  is  at  or  above  this  minimum  level  in  a  new  situation 
involving  the  same  test-type  and  job  without  carrying  out  a  validation 
study  of  any  kind.    Only  a  job  analysis  is  necessary,  in  order  to  ensure 
that  the  job  at  hand  is  a  member  of  the  class  of  jobs  on  vAiich  the 
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validity  distribution  was  derived.    In  Schmidt  and  Hunter  (1977),  tvo  of  the 
four  validity  distributions  fell  into  this  category,  even  though  only  three 
sources  of  artifactual  variance  could  be  corrected  for.    In  the  later  study 
(Schmidt  et  al.,  in  press)  in  v^ich  it  was  possible  to  correct  for  four 
sources  of  error  variance,  12  of  the  14  corrected  distributions  had  90  per- 
cent or  more  of  validities  above  levels  that  would  typically  be  indicative 
of  significant  practical  utility  (cf.  Hunter  &  Schmidt,  1979). 

These  methods  and  findings  indicate  that  in  the  future  validity  gen- 
eralization will  be  possible  for  a  wide  variety  of  test-job  combinations, 
such  a  develoE»nent  will  do  nuch  to  encourage  the  application  of  decision- 
theoretic  utility  estimation  tools. 
Difficulties  in  Estimating  SD^r 

The  third  major  reason  for  neglect  of  the  powerful  Brogden-Cronbach 
utility  mcxJel  was  the  difficulty  cf  estimating  SD^.    As  noted  above,  the 
generally  recommended  procedure  for  estimating  SE^  is  by  use  of  cost 
accounting  procedures.    Such  procedures  are  supposed  to  be  used  to  estimate 
the  dollar  value  of  performance  of  a  number  of  individuals  (cf .  Brogden  & 
Taylor,  1950a),  and  the  SD  of  these  values  is  then  computed.  Roche's 
(1961)  dissertation  illustrates  well  the  tremendous  time  and  effort  such 
an  endeavor  entails.    This  study  (summarized  in  Cronbach  and  Gleser,  1965, 
pp.  256-266)  was  carried  out  on  radial  drill  operators  in  a  large  mid- 
western  plant  of  a  heavy  equipment  manufacturer.    A  cost  accounting  pro- 
cedure called  "standard  costing"  was  used  to  determine  the  contribution 
of  each  enployee  to  the  profits  of  the  conpany.    Hie  procedure  was 
extronely  detailed  and  conplex,  involving  cost  estijnates  for  each  piece 
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of  material  machined,  direct  and  indirect  labor  costs,  overhead,  perishable 
tool  usage,  etc.    There  was  also  a  "burden  adjustment "  for  below  standard 
performance.    But  despite  the  conplexity  and  apparent  objectivity,  Roche 
is  ccHipelled  to  admit  that  "many  estimates  and  arbitrary  allocations 
entered  into  the  cost  accounting"  (p.  263,  in  Cronbach  &  Gleser,  1965). 
Cronbach,  in  commenting  on  the  Roche  study  after  having  discussed  it  with 
Itoche,  states  that  some  of  the  cost  accounting  procedures  used  are  unclear 
or  questionable  (Cronbach  &  Gleser,  1965,  pp.  266-267)  and  that  the 
accountants  perhaps  did  not  fully  understand  the  utility  estimation  problem. 
Thus  even  given  great  effort  and  expense,  cost  accounting  procedures  may 
nevertheless  lead  to  a  questionable  final  product. 

Recently  we  have  developed  a  procedure  for  obtaining  rational  estimates 
of  SDy.    This  method  was  used  in  a  pilot  study  by  62  experienced  supervisors 
of  budget  analysts  to  estiinate  SE^  for  that  occupation.    Supervisors  were 
used  as  judges  because  they  have  the  best  opportunities  to  observe  actual 
performance  and  output  differences  between  employees  on  a  day-to-day  basis. 
The  method  is  based  on  the  following  reasoning:    if  job  performance  in 
dollar  terms  is  normally  distributed,  then  tiie  difference  between  the  value 
to  the  organization  of  the  products  and  services  produced  by  the  average 
enployee  and  those  produced  by  an  oiployee  at  the  85th  percentile  in  per- 
formance is  equal  to  SE^.    Budget  Analyst  supervisors  were  asked  to  estimate 
both  these  values;  the  final  estimate  was  the  average  difference  across  the 
62  supervisors.    The  estimation  task  presented  to  the  supervisors  may  appear 
difficult  at  first  glance,  but  only  one  out  of  62  supervisors  objected  and 
stated  that  he  did  not  think  he  could  make  meaningful  estijiates.    Use  of  a 
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carefully  developed  questionnaire  to  obtain  the  estimates  apparently  aided 
significantly;  a  similar  questionnaire  was  used  in  the  present  study  and  is 
described  later.    The  final  estimate  of  SD^  for  the  budget  analyst  occupa- 
tion was  11,327  per  year  (standard  error  of  the  mean  =  $1,120).  This 
estimate  is  based  on  incumbents  rather  than  applicants  and  must  therefore 
be  considered  to  be  an  underestimate.    As  noted  earlier,  it  is  generally 
not  critical  that  estimates  of  utility  be  accurate  down  to  the  last  dollar r 
Utility  estimates  are  typically  used  to  make  decisions  about  selection  pro- 
cedures, and  for  this  purpose  only  errors  large  enough  to  lead  to  incorrect 
decisions  are  of  any  consequence.    Such  errors  may  be  very  infrequent. 
Further,  they  may  be  as  frequent — or  more  frequent — when  cost  accounting 
procedures  are  used.    As  we  noted  above,  Roche  ';i961)  found  that 5  even 
in  the  case  of  the  sinple  and  structured  job  he  studied,  the  cost 
accountants  were  frequently  forced  to  rely  on  subjective  estimates  and 
arbitrary  allocations.    This  is  generally  true  in  cost  accounting  and 
may  become  a  more  severe  problem  as  one  moves  up  the  occupational 
hierarchy.    What  objective  cost  accounting  techniques,  for  exairple, 
can  be  used  to  assess  the  dollar  value  of  an  executive's  impact  on 
subordinate  morale?    It  is  the  jobs  with  the  largest  SD^  values,  i.e., 
the  jobs  for  which  AU/selectee  is  potentially  greatest,  that  are  hiandled 
least  well  by  cost  accounting  methods.    Rational  estimates— to  one 
degree  or  another — are  virtually  unavoidable  at  the  higher  job  levels. 

Our  procedure  has  at  least  two  advantages  in  this  respect.  First, 
the  mental  standard  to  be  used  by  the  super  visor- judges  is  the  estimated 
cost  to  the  organization  of  having  an  outside  consulting  firm  provide 
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the  same  products  and/or  services,    in  many  occupations,  this  is  a 
relatively  concrete  standard.    Second,  the  idiosyncratic  tendencies, 
biases,  and  randOTi  errors  of  individual  experts  can  be  controlled  for 
by  averaging  across  a  large  number  of  judges.    In  our  initial  study, 
the  final  estimate  of  SD^  was  the  average  across  62  supervisors.  Unless 
this  is  an  upward  or  downward  bias  in  the  group  as  a  whole,  such  an 
average  should  be  fairly  accurate.    In  our  exaitple,  the  standard  error 
of  the  mean  was  1,120,    This  means  that  the  interval  $9,480  to  $13,175 
should  contain  90  percent  of  such  estimates,    (One  truly  bent  on  being 
conservative  could  enploy  the  lower  bound  of  this  interval  in  his  or 
her  calculations.) 

Methods  similar  to  the  one  described  here  have  been  used  successfully 
by  the  Decision  Analysis  Group  of  the  Stanford  Research  Institute  (Howard, 
Note  2)  to  scale  otherwise  unmeasurable  but  critical  variables.  Resulting 
measures  have  been  used  in  the  application  of  decision-theoretic  principles 
to  high-level  policy  decision-inaking  in  such  areas  as  nuclear  power  plant 
construction,  corporate  risk  policies,  investment  and  expansion  programs, 
and  hurricane  seeding  (Howard,  1966;  Howard  &  Matheson,  1972,  Raffia, 
1963;  Matheson,  Note  3).    All  indications  are  that  the  response  to  the 
work  of  this  group  has  been  quite  positive;  these  methods  have  been 
judged  by  high  level  decision-makers  to  contribute  valuably  to  iitprove- 
ment  of  socially  and  economically  inportant  decisions. 

in  roost  cases,  the  alternatives  to  use  of  a  procedure  like  ours  to 
estiinate  SD^  are  unpalatable.    The  first  alternative  is  to  abandon  the 
idea  of  a  utility  analysis.    This  course  of  action  will  typically  lead 
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to  a  gross  (inplicit)  underestimate  of  the  economic  value  of  valid  selec- 
tion procedures.    This  follows  if  one  accepts  our  contention  (Hunter  & 
Schmidt,  1979)  that  the  empirical  studies  that  are  available  indicate 
much  higher  dollar  values  than  psychologists  have  expected.    The  second 
alternative  in  most  situations  is  use  of  a  less  systematized,  and  probably 
less  accurate,  procedure  for  estimating  SDy.    Both  these  alternatives  can 
be  expected  to  lead  to  more  erroneous  decisions  about  selection  procedures. 

Hie  procedure  for  estimating  SE^  described  here  assumes  that  dollar 
outcomes  are  normally  distributed.    One  purpose  of  the  present  study  is  to 
evaluate  that  assunption. 

The  present  study  has  three  purposes:    (1)  to  illustrate  the  magnitude 
of  the  productivity  inplications  of  a  valid  selection  procedure,  (2)  to 
demonstrate  the  application  of  decision-theoretic  utility  equations,  and 
(3)  to  test  the  assunption  that  the  dollar  value  of  enplpyee  productivity 
is  normally  distributed. 

Procedure 

Uie  major  reason  for  our  choice  of  the  job  of  conputer  programer  was 
that  a  previous  study  (Rosenberg,  Schmidt,  &  Hunter,  Note  4)  had  provided 
remarkably  accurate  validity  estimates  for  this  job.    Applying  the  Schmidt- 
Hunter  (1977)  validity  generalization  mDdel  to  all  available  validity  data 
for  the  Programer  ;^titude  Test  (PAT;  Hughes  &  McNamara,  Note  5;  McNamara 
&  Hughes,  1961),  this  study  found  that  the  percent  of  variance  in  validity 
coefficients  accounted  for  in  the  case  of  job  proficienc/  criteria  for  the 
PAT  total  score  was  94  percent.    This  finding  effectively  refutes  the  sit- 
uational specificity  hypothesis.    The  estimated  true  validity  was  .76. 
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Uius  the  evidence  is  strong  that  the  (multivariate)  total  PAT  score  validity 
is  quite  high  for  predicting  performance  of  conputer  programers  and  that 
this  validity  is  essentially  constant  across  situations  (e.g.,  different 
organizations;  Rosenberg,  et  al.  Note  4).    Since  it  is  total  score  that 
is  typically  used  in  selecting  programers,  this  study  concerns  itself 
only  with  total  score  validity.    Because  the  PAT  is  no  longer  available 
commercially,  testing  costs  had  to  be  estimated.    In  this  study,  we 
assumed  a  testing  cost  of  $10  per  examinee. 

Estimates  of  SD^  were  provided  by  experienced  supervisors  of  com- 
puter programers  in  10  Federal  agencies.    These  supervisors  were  selected 
by  their  own  supervisors  after  consultation  with  the  first  author. 
Participation  was  voluntary.    Of  147  questionnaires  distributed,  105  were 
returned  (all  in  usable  form),  for  a  return  rate  of  71.4%.    In  order  to 
test  the  hypothesis  that  dollar  outcomes  are  normally  distributed,  the 
supervisors  were  asked  to  estimate  values  for  the  15th  percentile  ("low 
performing  programers''),  as  well  as  the  50th  percentile  ("average  pro- 
gramers"), and  the  85th  percentile  ("superior  programers").    Uie  resulting 
data  thus  provides  two  estimates  of  SD^.    If  the  distribution  is  approx- 
imately normal,  these  two  estimates  will  ;iot  differ  substantially  in  value. 

The  questionnaire  used  to  elicit  supervisor  estimates  is  shown  (in 
relevant  part)  in  Table  I.    Tlie  wording  of  this  questionnaire  was  care- 
fully iaveloped  and  pretested  on  a  smalj  sarrple  of  programer  supervisors 
and  personnel  psychologists.    None  of  the  pr'>;;ramer  supervisors  vrfio 
returned  questionnaires  in  the  study  repoited  any  difficulty  under- 
stand  .g  the  questionnaire  or  in  making  the  estimates. 
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OSiis  study  focuses  on  sele-ition  of  conputer  programers  at  the  G&-5 
through  9  levels.    GS  level  5  is  the  lowest  level  in  this  cccupaticnal 
series.    Beyond  it  is  unlikely  that  an  aptitude  test  like  the  PAT 

would  be  used  in  selection,    ^^licants  for  higher  level  programer  posi- 
tions are  esqpected  (and  required)  to  have  considerable  developed  expertise 
in  programing,  and  are  selected  on  the  basis  of  achievement  and  experience, 
rather  than  directly  on  ^titude.    The  vast  majority  of  programers  hired 
at  the  GS-9  level  are  promoted  to  GS-11  after  one  year.    Similarly  all  but 
a  minority  hired  at  the  GS-5  level  advance  to  GS-7  in  one  year  and  to  GS-9 
the  following  year.    Therefore  the  SEy  estimates  were  obtained  for  the 
GS-9-11  levels,  as  can  be  seen  in  Table  1.    Statistical  information  obtained 
from  the  Bureau  of  Personnel  Management  Information  Systems  of  the  Civil 
Service  Commission  indicated  that  the  number  of  programer  incumbents  in 
the  Federal  Government  at  the  relevant  levels  (GS-5  through  9)  was  4,404 
(as  of  October  31,  1976,  the  latest  date  for  which  figures  were  available). 
The  total  number  of  conputer  programers  at  all  grade  levels  was  18,498. 
Eor  1975-1976,  61.3  percent  of  all  new  hires  were  at  the  G&-5-9  levels. 
The  number  of  new  hires  government-wide  in  this  occupation  at  these  levels 
was  655  for  565  for  calendar  years  1975  and  1976,  respectively,  for  an 
average  yearly  selection  rate  of  618.    The  average  tenure  of  the  GS-5-9 
conputer  programers  was  determined  to  be  9.69  years. 

Data  from  the  1970  U.S.  Census  showed  that  there  were  166,556  conputer 
programers  in  the  U.S.  in  that  year.    Because  ^he  growth  rate  has  been 
rapid  in  this  occupation  recently,  this  figure  undoubtedly  underestimates 
the  current  number  of  programers.    However,  it  is  the  most  most  recent 
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estimate  available.    In  any  event,  the  effect  of  underestimation  on  the 
utility  results  is  a  conservative  one.    It  was  not  possible  to  determine 
the  number  of  conputer  programers  that  are  hired  yearly  in  the  U.S. 
economy.    For  purposes  of  this  study,  it  was  assumed  that  the  turnover 
rate  was  10  percent  in  this  occupation  and  that  therefore  .10  (166,556)  or 
16,655  were  hired  to  replace  those  who  had  quit,  retired  of  died.  Extrap- 
olating from  the  Federal  to  the  private  sector  workforce,  it  was  assumed 
that  61.3  percent  of  these  new  hires  were  at  occupational  levels  for  \^ich 
the  PAT  would  be  appropriate.    Thus  it  was  assumed  that  .613  (16,655)  or 
10,210  conputer  programers  could  be  hired  each  year  in  the  U.S.  economy 
using  the  PAT.    In  view  of  the  current  rapid  expansion  of  this  occupation, 
it  is  likely  that  this  number  is  a  substantial  underestimate. 

It  was  not  possible  to  determine  prevailing  selection  ratios  (SR)  for 
ccxtputer  programers  in  the  general  economy.    Because  the  total  yearly  number 
of  applicants  for  this  job  in  the  government  could  not  be  rietermined,  it 
was  also  iirpossible  to  estimate  the  government  SR.    This  information  lack 
is  of  no  real  consequence,  however,  since  it  is  more  instructive  to  examine 
utilities  for  a  variety  of  selection  ratios.    Utilities  were  calculated  for 
SR's  of  .05,  .10,  .20  .  .  .80.    The  gains  in  utility  or  productivity  as 
conputed  from  equation  (4)  are  those  that  result  when  a  valid  procedure  is 
introduced  where  previously  no  procedure  or  a  totally  invalid  procedure  has 
been  used.    The  assumption  that  the  true  validity  of  the  previous  procedure 
is  essentially  zero  may  be  valid  in  some  cases,  but  in  other  situations  the 
PAT  would,  if  introduced,  replace  a  procedure  with  lower  but  nonzero  true 
validity*    Hence,  utilities  were  calaalated  assuming  previous  procedure 
true  validities  of  .20,  .30,  .40  and  .50,  as  well  as  .00. 
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Using  a  modification  of  Equation  (4),  utilities  that  would  result  from 
one  year's  use  of  the  PAT  for  selection  of  new  hires  in  the  Federal 
Government  and  the  economy  as  a  whole  were  conputed  for  each  of  the  combina- 
tions of  SR  and  previous  procedure  validity  given  above.    When  the  previous 
procedure  was  assumed  to  have  zero  validity,  its  associated  testing  cost 
was  also  assumed  to  be  zero;  that  is,  it  was  assumed  that  no  procedure 
vas  used  and  that  otlierwise  prescreened  applicants  wera  hired  randomly. 
>hen  the  previous  procedure  was  assumed  to  have  a  nonzero  validity,  its 
associated  cost  was  assumed  to  be  the  same  as  that  of  the  PAT,  that  is, 
$1C  per  applicant.    As  mentioned  above,  average  tenure  for  government 
programers  was  found  to  be  9.69  years;  in  the  absence  of  other  informa- 
tion, this  tenure  figure  was  also  assumed  for  the  private  sector. 
AU/selectee  per  year  was  multiplied  by  9.69  to  give  final  A?Vselectee. 
Cost  of  testing  was  charged  only  to  the  first  year. 

Building  all  of  these  factors  into  equation  (4),  we  obtain  the 
equation  actually  used  in  conputing  the  utilties: 
AU  =  tNs  (ri  -  rj)  SDy  <^/p  -       (q  -  C2)/p, 
where: 

AU  »    the  gain  in  productivity  in  dollars  in  using  the  new 

selection  procedure  for  one  year^ 
t  «   tenure  in  years  of  the  average  selectee;  here  9.69, 
Ng  =   nunber  selected  in  a  given  year;  this  figure  was  618  for 
the  Federal  Government  and  10,210  for  the  U.S.  economy, 
»   validity  of  the  "new"  procedure,  here  the  PAT;  Tjl  »  .76, 
^2  =   validity  of  the  previous  procedure;      ranges  from  zero 
to  .50, 
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^  =  per  applicant  cost  of  the  new  procedure,  here  $10, 

=   per  applicant  cost  of  previous  procedure,  here  zero  or  $10. 
Ihe  terras  SD^,  <(, ,  and  p  are  as  defined  previously,    itie  figure  for  sCy 
was  the  average  of  the  two  estimates  obtained  in  thl?.  study.    Note  that 
although  this  equation  gives  the  productivity  gain  that  results  from 
substituting  for  one  year  the  new  (more  valid)  selection  procedure  for 
the  previous  procedure,  these  gains  are  not  all  realized  the  first  year. 
They  are  spread  out  over  the  tenure  of  the  new  employees. 

Results  and  Discussion 

Estimation  of  Yearly  SD^ 

The  two  estimates  of  SD^  were  quite  similar.    The  irean  estimated 
difference  in  dollar  value  of  yearly  job  performance  between  programers 
at  the  85th  and  50th  percentiles  in  job  performance  was  $10,871  (standard 
error  =  $1673).    ihe  figure  for  the  difference  between  the  50th  and  15th 
percentiles  was' 9,955  (sw:;.«cird  error  =  $1,035).    itie  difference  of  826 
dollars  is  roughly  8  percent  of  each  of  the  estimates  and  is  not  statis- 
tically significant.    Thus  the  hypothesis  that  computer  programer  produc- 
tivity in  dollars  is  normally  distributed  cannot  be  rejected.    The  distri- 
bution appears  to  be  at  least  approximately  normal.    The  average  of  these 
two  estimates,  $10,413,  was  the  SD^  figure  used  in  the  utility  calculations 
below.    This  figure  trust  be  ■  -  isidcred  to  be  an  underestimate  since  it 
applied  to  incumbents  rather  d.an  to  the  applicant  pool.    As  can  be  seen 
from  the  two  standard  errors,  supervisors  shewed  scxnewhat  better  agreement 
on  the  productivity  difference  between  "low  performing"  and  "average  pro- 
gramers" than  on  the  difference  between  "average"  and  "superior"  programers. 
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Table  2  shows  the  gains  in  productivity  in  millions  of  dollars  that 
vjould  result  from  one  year's  use  of  the  PAT  to  select  conputer  programers 
in  the  Federal  Government  for  different  combinations  of  SR  and  previous 
procedure  validity.    As  expected,  these  gains  increase  as  SR  decreases 
and  as  the  validity  of  the  previous  procedure  decreases.    When  SR  is  .05 
and  the  previous  procedure  has  no  validity,  use  of  the  PAT  for  one  year 
produces  a  productivity  gain  of  97.2  million  dollars.    At  the  other 
extreme,  if  SR  is  .80  and  the  procedure  the  PAT  replaces  has  a  validity 
of  .50,  the  guin  is  "only"  5.6  million  dollars.    Ihe  figures  in  all  cells 
of  Table  2  are  quite  large — larger  than  most  industrial-organizational 
psychologists  would,  in  our  judgment,  have  expected.    These  figures,  of 
course,  are  for  total  utility.    Gain  per  selectee  for  any  cell  in  Table 
2  can  be  conputed  by  dividing  the  cell  entry  by  618,  the  assumed  yearly 
nunt>er  of  selectees.    For  exanple,  v*en  SR  =  .20  and  the  previous  pro- 
cedure has  a  validity  of  .30,  gain  per  selectee  is  $64,725.  As 
indicated  earlier,  the  gains  shown  in  Table  2  are  produced  by  one  year's 
use  of  the  PAT  but  are  not  all  realized  during  the  first  year;  they  are 
spread  out  over  the  tenure  of  the  new  CTployees.    Per  year  gains  for  any 
cell  in  Table  2  can  be  obtained  by  dividing  the  cell  entry  by  9.69,  the 
average  tenure  of  conputer  programers. 

Table  3  shows  productivity  gams  for  the  ecx:::omi'  as  a  whole  resulting 
from  use  of  tiie  PAT  or  substitution  of  the  PAT  for  less  valid  procedures. 
Table  3  figures  are  hased  on  the  assumed  yearly  selection  of  10,210 
conputer  programers  nationwide.   Again,  the  figures  are  tor  the  total 
productivity  gain,  but  gain  per  selectee  can  be  conputed  by  dividing  the 
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cell  entry  'a/  the  nimber  selected.    Once  mean  gain  per  selectee  is 
obtained,  the  reader  can  easily  corrpute  total  gain  for  any  desired  number 
of  selectees.    As  expected,  these  figures  are  considerably  larger, 
exceeding  one  billion  dollars  in  several  cells.    Althought  we  have  no 
direct  evidence  on  this  point,  we  again  judge  that  the  productivity  gains 
are  much  higher  than  most  industrial-organizational  psychologists  would 
have  suspected. 

In  addition  to  the  assuirptions  of  linearity  and  normality  discussed 
earlier,  the  productivity  gain  figures  in  Tables  2  and  3  are  based  on 
the  assuitption  that  selection  proceeds  from  tc^scoring  applicants  down- 
ward until  the  SR  has  been  reached.    That  is,  these  analyses  assume 
that  selection  procedures  are  used  optimally.    Because  of  the  linearity 
of  the  relation  between  test  score  and  job  performance,  any  other  usage 
of  a  valid  test  would  result  in  lower  mean  productivity  levels  among 
selectees.    For  exanple,  if  a  cutting  score  were  set  at  a  point  lower 
than  that  corresponding  to  the  SR  and  if  applicants  scoring  above  thi^ 
minimum  score  were  then  selected  randomly  (or  selected  on  other  non- 
valid  procedures  or  considerations),  productivity  gains  would  be  con- 
siderably lower  than  show  in  Tables  2  and  3.    (Ihey  would,  however, 
typically  still  be  substantial.) 

Olie  PAT  is  no  longer  available  cominercially.    Originally  marketed 
by  Psychological  Corporation,  it  was  later  distributed  by  IBM  as  part  of 
•^package  deals"       conputer  systems  purchasers.    However,  this  practice 
was  dropped  about  1974,  and  since  then  the  PAT  has  not  been  available  to 
most  users  (Note  6).    This  fact,  however,  need  create  no  problems  in  terms 
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of  validity  generalization.    The  results  of  this  study  generalize  directly 
to  other  tests  and  subtests  with  the  same  factor  structure,    ihe  three 
subscales  of  the  PAT  are  conposed  of  very  conventional  number  series, 
figure  analogies,  and  arithmetic  reasoning  items.    New  tests  can  easily 
be  constructed  that  correlate  1.00,  corrected  for  attenuation,  with  the 
PAT  subtests. 

It  should  be  noted  that  productivity  gains  conparable  to  those  shown 
in  Tabj.es  2  and  3  can  probably  be  realized  in  other  occupations,  such  as 
that  of  clerical  worker,  in  which  laver        values  will  be  oifset  by 
the  larger  numbers  of  selectees.    Pearlman,  Schmidt,  arid  Hunter  (Note  7) 
present  extensive  data  on  the  generalizability  of  validity  for  a  number 
of  different  kinds  of  cognitive  measures  (constructs)  for  several  job 
families  of  clerical  work. 

There  is  another  way  to  approach  the  question  of  productivity  gains 
resulting  from  use  of  valid  selection  procedures.    One  can  ask  what  the 
productivity  gain  would  have  been  had  the  entire  incumbent  population 
been  selected  using  the  more  valid  procedure.    As  indicated  earlier,  the 
incumbent  population  of  interest  in  the  Federal  Government  numbers  18,498« 
As  an  e^anple,  suppose  this  population  had  been  selected  using  a  procedure 
with  validity  of  .30  using  a  SR  of  .20.    Ihen  had  the  PAT  been  used 
instead,  l;he  productivity  gain  would  have  been  approximately  1.2  billion 
dollars  [9.69  (18,498) (.76-. 30)  10,413  (.28/.20)].    Expanding  this  exanple 
to  the  economy  as  a  vAiole,  the  productivity  gain  that  would  have  resulted 
is  10.78  billion  dollars. 
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Obviously,  there  are  many  other  such  exairples  that  can  be  worked  out, 
and  we  encourage  readers  to  ask  their  own  questions  and  derive  their  own 
answers.    However,  virtually  regardless  of  the  question,  the  answer  always 
seenis  to  include  the  conclusion  that  it  does  make  a  difference— an  important 
practical  difference— how  people  are  selected.    Vfe  conclude  that  the  inpli- 
cations  of  valid  selection  procedures  for  workforce  productivity  are  nuch 
greater  than  most  of  us  have  realized  in  the  past. 
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TABLE  1 

Questionnaire 
Estimation  of  Selection  Utility 
Conputer  Programers  (GS-334) 

  ^Dept.   _  Agency 


INSTR[JCnC»IS 

the  dollar  utility  estimates  we  are  asking  you  to  make  are  critical  in 
estimating  the  relative  dollar  value  to  the  government  of  different  selection 
methods,    m  answering  these  questions,  you  will  have  to  make  some  very 
difficult  judciments.    We  realize  they  are  difficult  and  that  they  are  judg> 
ments  or  estimates,    you  will  have  to  ponder  for  some  time  before  givii.^  each 
estimate,  and  there  is  probably  no  way  you  can  be  absolutely  certain  your 
estimate  is  accurate  when  you  do  reach  a  decision.    But  keep  in  mind  three 
things: 

(1)  The  alternative  to  estimates  of  this  kind  is  application  of 
cost  accounting  procedures  to  the  evaluation  of  job  performance. 
Such  applications  are  usually  prohibitively  expensive.    And  in 
the  end,  they  produce  only  inperfect  estimates,  like  this 
estimation  procedure. 

(2)  Your  estimates  will  be  averaged  in  with  those  of  other  super- 
visors of  conputer  programers.    Thus  errors  produced  hy  too 
high  and  too  low  estimates  will  tend  to  be  averaged  out, 
providing  more  accurate  final  estimates. 
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(3)  The  decisions  that  ™st  be  Tade  atot  selection  a^thods  do  not 
require  that  all  estimates  be  accurate  down  to  the  last  dollar, 
substantially  accurate  estimates  will  lead  to  the  same 
decisions  as  perfectly  accuarte  estijtates. 
Based  on  your  experience  with  agency  progran^rs,  we  would  like  for 
you  to  estimate  the  yearly  value  to  your  agency  of  the  products  and  services 
produced  ty  the  average  GS  ^11  cx>nj«ter  progr»er.    Consider  the  ^ality 
^  qu^tity  of  output  typical  of  the  average  ErSES-t  ^  *e  value  of 
this  output,    in  placing  an  overall  dollar  value  on  this  output,  it  may 
help  to  consider  what  the  cost  «,uld  t«  of  having  an  outside  firm  provide 

these  products  and  services. 

Based  on  iry  experience,  I  estiinate  the  value  to  mi- 
agency  of  the  average  GS  0-11  conputer  prograner  at 

 ^  dollars  per  year. 

Vfe  would  now  like  for  you  to  consider  the  "superior"  Erogrffler .  I^t 
us  define  a  superior  E«r£orn.r  as  progr^r  who        -  the  85th  percentile. 
*at  is,  his  or  her  pc.forn^nce  is  better  than  that  of  85,  of  his  or  her 
fellow  GS  9-11  progr^rs,  and  only  15%  turn  in  better  per£or„«nces. 
consider  the  quality  and  quantity  of  the  output  typical  of  the  superior 
■progra«r.    Then  esti^te  the  value  of  these  products  and  services.  In 
blacing  an  overall  dollar  value  on  this  output,  it  may  again  help  to 
consider  what  the  cost  v^ld  be  of  having  an  outside  fir.  provide  these 
products  and  services. 
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Based  on  my  expedience,  I  estimate  the  value  of  a 

superior  GS  9-11  cOTputer  programer  to  be 

  dollars  per  year. 

Finally,  we  would  like  you  to  consider  the  "low  performing"  computer 
programer .    Let  us  define  a  low  performing  programer  as  one  who  is  at  the 
15th  percentile.    That  is,  85%  of  all  GS  9-11  conputer  programers  turn  in 
performances  better  than  the  lew  performing  programer,  and  only  15%  turn 
in  worse  performances.    Consider  the  quality  and  quantity  of  the  output 
typical  of  the  Iw  performing  programer.    Then  estimate  the  value  of  these 
products  and  services.    In  placing  an  overall  dollar  value  on  this  output, 
it  may  ag:in  help  to  consider  v^at  the  cost  would  be  of  having  an  outside 
firm  provide  these  producst  and  services. 

Based  on  my  experience,  I  estimate  the  value  to  my 

agency  of  the  low  performing  GS  9-11  ccHiputer  programer 

at  dollars  per  year. 
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TABLE  2 

EstiitBted  ProcJuctivity  Increase  from  One  Year's  Use  of  the  PAT 
to  Select  Conputer  Prograners  in  the  Federal  Government 
(In  Millions  of  Dollars) 

True  Validity  of  Previous  Procedure 
SR  ,00  .20  .30  .40  .50 


.05 

97.2 

71.7 

58.9 

46.1 

33.3 

.10 

82.8 

60.1 

50.1 

39.2 

28.3 

.20 

66.0 

48.6 

40.0 

31.3 

22.6 

.30 

54.7 

40.3 

33.1 

25.9 

18.7 

.40 

45.- 

34.6 

27.6 

21.6 

15.6 

.50 

37.6 

27.7 

22.fi 

17.8 

12.9 

.60 

30.4 

22.4 

18.4 

14.4 

10.4 

.70 

23.4 

17.2 

14.1 

11.1 

8.0 

.80 

16.5 

12.2 

10.0 

7.8 

5.6 
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TABLE  3 

Estimated  Productivity  Increase  from  One  Year's  Use  of  PM" 
to  Select  Conputer  Programers  in  U.S,  Economy 
(In  Millic»is  of  Dollars) 

True  Validity  of  Previous  Procedure 
Sk  .00  .20  .30  .40 


.05 

1605 

1184 

973 

761 

550 

.10 

1367 

1008 

828 

648 

468 

.20 

1091 

804 

661 

517 

373 

.30 

903 

666 

547 

428 

309 

.40 

753 

555 

455 

356 

257 

.50 

622 

459 

376 

295 

213 

.60 

501 

370 

304 

238 

172 

.70 

237 

285 

234 

183 

13? 

.80 

273 

201 

165 

i:>9 
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JOB  PERFORMANCE  OF  USAF  BYPASSED  SPECIALISTS 


Captain  William  H.  Cummings  and  Captain  David  S.  Vaughan 
USAF  Occupational  Measurement  Center 


An  Apprentice  Knowledge  Test  (AKT)  is  a  65-iteni  multiple  choice 
examination  designed  to  measure  job  knowledge  at  the  three-skill  level 
in  a  particular  Air  Force  enlisted  job  specialty.    An  AKT  is  also, 
to  a  large  extent,  a  source  of  headaches  for  the  Occupational  Measure- 
ment Center,  where  the  tests  are  written.    The  reason  for  this  is  some 
of  the  complaints  the  Center  has  received  about  the  results  of  these 
tests.    These  complaints  center  on  the  observation  that  some  airmen 
who  pass  the  tests  —  and  who  are  therefore  selected  for  entry  into 
the  career  field  at  the  apprentice  level  —  cannot  do  the  work  that  is 
expected  or  required  of  them.    This  paper  deals  with  our  attempts  to 
solve  this  problem  and  relieve  some  of  the  headaches. 

One  major  responsibility  of  the  Occupational  Measurement  Center 
is  to  construct  the  tests  that  support  the  Weighted  Airman  Promotion 
System,  and  related  tests.    The  promotion  testing  program,  which  includes 
the  Specialty  Knowledge  Tests  and  the  Promotion  Fitness  Examinations, 
has  proceeded  smoothly  since  its  inception  in  1969.    However,  as  noted 
the  Apprentice  Knowledge  Test  program  (which  currently  includes  151  of 
the  "related  Tests")  has  run  into  a  number  of  problems.    These  problems 
spring  largely  from  some  of  the  uses  to  which  the  tests  are  put,  which 
are  quite  different  from  the  uses  of  the  promotion  tests. 

As  noted,  AKTs  are  used  to  select  airmen  for  entry  into  a  career 
field  at  the  apprentice  level,  or  the  three-skill  level.    In  this 
capacity,  a  major  use  of  the  AKT  is  to  allow  an  airman  to  bypass  tech- 
nical school  by  showing  his/her  proficiency  on  the  test.    For  example, 
if  an  airman  has  prior  civilian  or  military  training  as  an  orderly,  he/ 
she  may  take  the  AKT  for  the  Medical  Services  career  field;  or  if  he/ 
she  is  trained  in  electronics,  he/she  can  take  an  AKT  appropriate  to  one 
of  the  numerous  electronics  career  fields.    If  he/she  passes,  he/she  will 
go  directly  to  his/her  first  assignment  with  his/her  three-skill  level, 
rather  than  through  technical  school.    This  program  obviously  represents 
a  major  time  savings  for  the  airman  and  a  major  dollar  savings  for  the 
Air  Force. 

To  pass  the  AKT,  the  airman  must  score  higher  than  thirty  percent 
of  the  airmen  who  have  previously  taken  the  AKT.    This  scoring  system 
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we  feel,  is  responsible  for  most  of  t^.r.    omplaints  with  the  AKT  system. 
If  most  of  the  airmen  who  have  already'  "3es  test  are  well -qua! i fieri, 

then  the  examinee  will  need  a  substanti     -noiirrt  of  jet  knowledge  to  pass 
his  AKT.    :f  not,  he/she  may  get  by  with  jnly  a  minimal  display  of  knowledge. 
Therefore,  passing  an  AKT  does  not  necessarily  have  any  meaning  relative 
to  the  kisBsfledge  required  to  pe.'fom  at  the  trrree-skill  level  on  t*is  jnb. 

We  lra»e  recently  been  developing  =  crita -ton-referenced  scoring  system 
for  the  AIT.    Under  this  system,  each    T,  is  developed,  woald  be 

ffrst  admrnistered  at  the  three-level  —chrrtcil   ;;chool  for  that  csreer 
fiald.    Tm=  AKT  passing  score  would  th-        ^sitblished  relative  zo  the 
perforraaTHEss  of  recent  technical  schoo    gra'Sistjer  on  the  test.  Tedinical 
School  graauates  are  a  good  reference  r'oic;.  rrnce  th^  are  generally 
turned  tcrhave  the  minimum  knowledge  r^auirsi-  for  adequate  performance 
at  the  thT-e-skill  level.    By  passing  tie  Ar...  the  byasass  candidal  will 
be  demonsxjrating  something  more  than  gor  e  a^ormance  relative  to  3t  grcap 
off  examtnBBS  with  unknown  qualities.    He"r?«re  will  be  'JemonstratiTRC  t*»a± 
Te/she  he  at  least  the  minimum  knowlsds  ^^eqirimd  fo?-  successful  aefrfonnance 
as  an  entrry-level  airman.    Vaughan  (1^^    i'^  has  prP>-i^0LSly  invevtrraa^Hi 
this  crft2srion-referencing  system  and  -Po"ri<«       to  be  a  wcrkablf^  otr^zsahire. 

Presear-.  plans  call  for  the  passrng-  scr-e  --  De  s.n  asL  the  twr'U' 
perra?n+ 'Te=  of  technie=il  school  gradu'^     ;  zures  on  t     test.    Use  m 
tf*e  fiifemSj  iwrcent-'le  ::=s  the  passing  -.^  fcirly  arbitrary  bui  Ifbased 

on  iOiTie  soupd  logica'  :onsiderations     "  t   lerrrerrtile  fpr  the  pssliif 
scor-       -flic -not  bt  -3=^  extremely  higKi,  :.1v:e  •^rrts  vwaul    require  tia&by- 
passj^  oU6c  ''*^-"Tts  tc  -now  more  thar  -  -  J.  ^jfwnrial  psrc5.^'tage  ot  ^hnica' 
schord  g.adU(K2?^..  would  be  botr  ui  the  rrypsLx  cand'-dsw  and 

wasteful  ir.  terms  jsT'^" fining  a Iready-t?-';  aTrtmn.    -iaweve-  the 

percenttdfc:       uf -y,'    oe  set  at  the  Twr  -    -•  •re^  o-  iierrariKince  ^sv  "the 
technic^  sr       ':iraaug»'^(<f'^ v.  either.    l)Ctr--     /  '.ow  scorer  are  Ltl^-y  to 
contain '-:on.-     ./eble       ,k  (Lord  &  Nov^n..     78)  arrd  may  reflect*:  fes 
job  knot'-^e^    Hawi  tM^  B/ssnnee  actua".  •,  'nr^.    Therefore,  the  Ofventr  per- 
centnYe-'-as"st  -i.^ed. 

^  ^r^sent  studv  ^es  designed  t   ~  i*rirta-e  some  of  the  arbitrariness 
invo'vee  Tr\  v&e  fl^the  tenth  percenti  '        tht  passing  point.    TW8  groups 
of  airmen       ^*chm'ca1  school  graduate    .iid^  byaassed  specialists  —  who  had 
recently'  c  '-ert'^  the  madical  services  c^Ti^  ^ield  were  compared  csm  a  job 
perfOTmaKyp  ^carv^vy  measure.    This  compe-isom  p<rovides  the  inforroatiion 
needed  :33  ii'iDW  ^  to  set  the  AKT  passirt.g  pp-^^t  real-^sticany.        the  by- 
passes! special isns  perform  about  as  well  ^  tr,  e  graduates,  thp^  o  jtass/ 
fail  iwimt  to  or  below  that  o^  the  ^nrth  percentile  would  bfe 

appnoprr^-e.    r'the    bypassed  specialis- lot  performs  wen.  a  more 
strimgent  xrHc^-ion  may  be  necessary. 

Anottw'^asr  t  of  this  study  is  that  it  wf  1  demonstrate  ttne  extent 
to  wpich  xrriffiri  n  referencing  will  affect  ~he;  job  performance  err 
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bypassed  specialists.    The  new  scoring  system     "i,  in  miost  rass.  affect 
the  amount  of  basic  knowledge  the  oypassed  spec^Jllst  will  bnm  xo  his/ 
her  first  assignmemt.    It  is  not  yet  clear,  rnowewer,  how  this  kraiwledge 
cHfference  will  affect  actual  job  performance. 


"Method 


Zmbjects 

Lists  of  recent  three-skill-  level  tecrmfiCaT  school  graduates:  and  by- 
passed specialist:  rn  the  mediea.  services  career  field  were  obisfned 
from  the  Air  Force  l^ilitary  Personnel  Center.    '-'^  these  lists,  306 
airmen  were  selectea  for  parti o^attor..    Seven f.y-»>infr  technical  ^diool 
crraduates  and  36  byaassed  specriov -^^ts  retuirned  sook  ,ets  m  ccmpfee  and 
■zsable  form,  representing  an  o\e>--  '  iisable  n^-^jrn  -ate  (of  3h%. 


Materials 

"he  itjain  part  of  the  survey  mate^iai:^^  w     a  tTfOdifiied  job  inventory 
booklet.    The  inventory  booklets  are  deveiorEMtl  by  the  Center's  Occupational 
Purvey  Branch  for  the  various  airman  sped  aTti-  ..    Each  inventory  booklet 
iE  designed  to  contain  a  comprehensi  e  ^ is"  q'  aT^  of  the  tasks  that  might 
OE  performed  by  an^  individual  in  a  -ivsn  ^se-'^lty.    Each  -job  teisk  very 
specrfically  describes  a  correspondi—-  ^ot)  behaivior  {.e.g.,  'Assemble 
ecuiipment  for  cardiac  monitor* ng,"     .^iimrin-v.ter  eye  irrngations,"  ^^^J^in 
blood  from  blood  bank").    In  the  ca.e  jf'  ^(ttcal  Services,  there- Jrfe  505 
listed  tasks,  and  additional  space      3-^wided  for  up  to  69  wnt*-^  tasks. 
Ir  addition  to  the  task  data  section,      ^ftenswe  section  was  i-n;  uded 
for  background  information  on  the  aimran:     nstom'ra',  dita,  time  irn  present 
job,  time  in  career  field,  duty  area,  z'/vses  of  equipfner.    used,  «tc-  A 
similar  background  information  booklp-  ^royiri^d  for  background  idai.  on 
the  supervisor. 


Procedure 

For  the  purposes  of  thi  ~  study,  the  ^urve:y     ,  .reslfure  folli»et  three 
steps.    First,  the  airman  was  asked  to  cnmplete        survey  bomrret  by 
checking  all  of  the  tasks  that  he  performed       h  s  present  job  ^see 
=ig    1)     Second,  the  supervisor  was  askni  i-  raz.::  the  airman'::-  Tserforron 
on  a  7-point  scale  for  each  task  that  wa=  r*e^  <ed  rrff.    A  sample jating 
scale  was  provided  at  the  top  of  each  pa^,  -  -die.-; -.-"^q  that  ^  "1"  repre- 
sented "Very  Much  Below  Average"  performance,  jp  to  o  '7",  which  repre- 
sented "Very  Much  Above  Average"  performance.        rd,  ^ipproximately  one 
month  after  the  survey  booklet  was  returned,  t"-e  -.upeo-visor  was  mailed 
g  follow-up  questionnaire.    This  questionnafre  requ^'Sted  a  single  rating 
of  the  airman's  overall  job  performance  on  a  "'O-ro-'mt  Lrkert^type  scale. 

7^: 
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JOB  INVENTORY  '''\r\?m 

(DUTY  ^  TASK  USTI  ^^^^^ 

pace)  6   of34  PA 

r.ES 

G.    PERFORMING  NURSING  PROCEDURES  (CONTINUED) 

V 
ir 

DONi: 
NOW 

Task  Performance 

1.  Wmty  much  htiom 

2.  Stighlly  b«lo«> 
awvrog*. 

4.    Atioui  a»rr09«. 

A.    Abow«  OvVrajiT. 
7.    Vvfy  rrwCh  obov* 

41.   Apply  heat  by  chemical  heating  pads 

36 

42,    Apply  heat  by  compresses 

37 

43.   Apply  heat  by  electrical  heating  oads 

38 

44.    Apply  heat  by  heat  cradles 

39 

45.    Apply  hefft  by  hot  water  bottles 

40 

46,    Apply  hsat  by  K-pads 

41 

47.    Apply  heat  by  thermal  blankets 

42 

48,    Apply  long  arm  plaster  casts 

43 

49.   Apply  long  leg  plaster  casts 

44 

Figure  1.    Sample  from  survey  booklet 


Dependent  Misasures 

Job  performance  data  from  the  505  job  tasks,  in  addition  to  data 
previously  obtained  on  these  tasks,  were  condensed  into  four  dependent 
measures: 

TOTAL  TASKS,  the  total  number  of  job  tasks  performed  by  the  airman.  This 
measure  was  a  count  of  all  of  the  tasks  for  which  the  supervisor  gave  the 
airman  a  rating.    Thus,  it  was  not  a  single  count  of  tasks  the  airman 
claimed  to  perform^  but  an  indication  of  the  tasks  that  the  supervisor 
recognized  the  airman  as  performing. 

X  (DIFF  X  RATING),  the  average  of  the  task  performance  ratings,  with 
each  task  performance  rating  multiplied  by  the  difficulty  of  that  task. 
The  task  difficulty  data  were  obtained  from  the  Occupational  Survey  Report 
(OSR)  previously  available  for  this  career  field  (Ballentine  &  Cole,  1975). 
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Job  incumbents  had  rated  the  "Task  Learning  Difficulty"  of  each  task 
("the  need  for  lengthy,  systematic  training  before  a  new  member  of  the 
appropriate  Air  Force  Specialty  could  perform  the  task  adequately")  on 
a  1-9  scale,  with  a  rating  of  "1"  indicating  "Least  Difficult  to  Learn" 
and  a  rating  of  "9"  indicating  "Most  Difficult  to  Learn".    This  measure 
provided  an   average  measure  of  the  airman's  job  performance,  as  opposed 
to  TOTAL  TASKS,  which  was  a 'summation  measure. 

EQUIP  ITEMS,  the  total  number  of  equipment  items  the  airman  indicated  that 
he/she  used  on  his/her  present  job. 

FOLLOW-UP,  the  airman's  overall  job  pe'-formance,  as  rated  ::n  the  30-point 
follow-up  survey  scale. 


Results  and  Discussion 
Differences  in  Job  Performance 

Table  1  presents  the  t-tests  and  summary  statistics  comparing  the 
two  groups,  and  Figs.  2-5  present  histograms  for  both  groups.  Neither 
Table  1,  nor  anv  of  the  histograms  show  any  significant  differences  in 
the  mean  performance  levels  of  the  two  groups.    The  mean  and  median 
performance  levels  of  the  bypassed  specialists  are  slightly  higher  than 
those  of  the  tech  school  graduates  for  all  four  measures.    However,  the 
column  of  t-tests  shows  that  none  of  these  differences  is  significant. 
Bypassed  specialists  show  significantly  more  variation  in  number  of 
equipment  items  used  i      =  4.18,  p  <  .05,  by  Bartletfs  test).  This 
trend  is  repeated,  although  nonsignificantly,  in  the  case  of  TOTAL  TASKS 
and  X  (DIFF  x  RATING)  but  reversed  in  the  case  of  FOLLOW-UP.  Similarly, 
the  histograms  reflect  no  substantial  differences  in  terms  of  either 
central  tendency  or  significant  numbers  of  outliers  at  the  lower  ends  of 
the  distributions.    These  analyses  indicate  that  the  bypassed  specialists 
are  at  least  capable  of  holding  their  own  against  the  tech  school  graduates 
in  their  first  job  assignments,  if  not  slightly  outperforming  them. 

Further  analyses  were  conducted  to  control  for  the  effects  of  various 
background  variables.    One  rather  disturbing  finding  was  the  large  number 
of  airmen  (73%  of  the  bypassed  specialists  and  58%  of  the  tech  school 
graduates)  who  had  already  advanced  to  the  five-skill  level.    Thus,  the 
groups  were  split  by  current  skill  level,  and  a  2  X  2  analyses  of  variance 
(Career  Field  Entry  Method  X  Current  Skill  Level)  was  performed  on  each 
measure.    For  X  (DIFF  x  RATING),  EQUIP  ITEMS,  and  FOLLOW-UP,  there  were 
no  significant  main  effects  or  interactions  (all  Fs  <  2,  with  df  =  1 ,  92). 
For  TOTAL  TASKS,  the  bypassed  specialists  (X  =  115.56)  tended  to  outperform 
the  tech  school  graduates  (X  =  99.93);  F  (1,  92)  =  3.49,  p  =  .06.    The  main 
effect  of  Skill  Level  (F  =  1.62)  and  tiie  interaction  (F  =  1.83)  were  non- 
significant.   Similarly,  analyses  of  covariance,  which  incorporated  various 
background  variables  as  covariates,  failed  to  reveal  any  significant  or 
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Table  1 


Cannparat"  /e 

Job  Pe'Tir^ar 

Data 

Measure 

Variable 

Group 

- 

Mdn 

SD 

t 

TOTAL  TASKS 

Tech  ztmol 
3ypas: 

I'ri.  . 

103.50 
112.50 

42.7a 
53.611 

-  1.29 

X  (0  X  R) 

Tech  1.— 
Bypasi 

12^..  ,  0 
~.r-:3 

23.25 
23.42 

4.7? 
5.2C 

-  0.84 

EQUIP  ITEMS 

Tech  S  '^ojI 
Bypass 

5.57" 
"  f 

16.17 
17.10 

7.3C 
9.75 

-  0.63 

FOLLOW-UP 

Tech  Sch-ol 
Bypass 

25.59 
25.75 

6.19 
5.77 

-  0.85 

noteworthy  differences  fces/een  ctfe  two  pmups.    These  analyses  consistently 
demonstrate  that,  at  l^r^for  tltis  spe-  alty,  the  bypassed  specialists 
perform  up  to  the  level  err  the  tsctrnica    school  graduates. 

Effects  of  Criterion  Re^- ^-encin . ,  on  the  ,^02X0  Career  Field 

An  additional  qustfon  of  ■::0rtsider25:e  interest  is  the  degree  to  which 
the  criterion  >eferencinQ  pror^'-'Hiire  wil  '  affect  the  actual  job  performances 
of  bypassed  specialist      th-<  rareer  -ield.    This  specialty  was  one  of 
those  examined  in  the        ier  criterion  referencing  studies  (Vaughan,  1976b. 
Previous  tech  school  tnr-  -ates  *?Bd  taken  The  same  AKT  that  the  bypassed 
specialists  in  this  suu    took^,  and  the  raw  score  corresponding  to  the 
tenth  percentile  of  ti>      ch  school  graduates'  performances  was  computed. 
This  is  the  passing  sc:  ■   "hat  would  be  established  under  the  proposed 
criterion-referenced         ng  Svrjtem.    In  this  case,  that  raw  score  was 
34  (out  of  a  possible  score  of  65  points).    The  actual  passing  score 

for  this  AKT  varied  besMf^  26  and  32  points  (depending  on  the  performance 
of  previous  AKT  candidai^c.       ~hus,  there  is  a  cluster  of  bypassed  specialists: 
within  this  sample  who  the  AKT  but  who  would  have  failed  under  the  new 

system.    These  "theoretica    failures"  are  plotted  in  Figs.  2-5  as  the  small 
arrows  above  each  histograir  for  the  bypass  group. 

It  is  obvious  that  the  theoretical  failures  are  quite  evenly  scattered 
throughout  the  distribut  ons  and  that  criterion  referencing  would  have 
very  little  effect  on  the  jot  performances  of  bypassed  specialists  in  this 
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Figure  2.     isi  ibutions  of  TOTAL  "ASKS  scores. 
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Figure  3.    Distributions  of  X  (DIFF  x  RATING)  scores. 


730 


Variable  .-EQUIP  ITEIWS 
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Figure  4.    Distributions  of  EQUIP  ITEMS  scores. 
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Figure  5.    Distributions  of  FOLLOWUP  scores. 
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career  field.    However,  it  does  not  sear  appropriate  at  this  point  to 
conclude,  in  general,  that  criterion  re^-?rencing  will  not  have  any 
effect  on  job  performance.    The  Medraa"  --?rvices  career  field  is  fairly 
unique  in  two  aspects.    First,  the  thessiiE-ical  passing  score  was  fairly 
close  to  the  actual  passing  score  {rsirrz-^  from  two  to  eight  points  away). 
Second,  there  is  a  great  deal  of  pre-3snr.ce  experience  available  in  this 
career  field  (e.g.,  orderly,  nurse's-  sncH.j .    It  is  likely  that  most  AKT 
examinees  in  this  area,  who  claim  r^>.ssmnt  job  experience  or  job  knowledge 
are  more-or-less  qualified  to  do  tte  wDrk.    Exact  placement  of  the  criterion 
may  not  be  too  important  in  this  type  ur  specialty.    The  case  may  be  quite 
different  for  other  career  fields.    This  state  of  affairs  indicates  the 
need  to  replicate  this  study  in  other  career  fields. 

Criterion-Related  Validation  of  the  ftKT 

This  study  provides  a  unique  opcportunity  to  validate  one  of  the 
Center's  tests  against  certain  job  performance  measures.    If,  in  fact, 
the  AKT  does  not  correlate  with  at  least  some  of  these  measures,  this 
finding  in  itself  would  have  important  implications  for  use  of  the  AKT 
and  positioning  of  the  pass/fail  criterion. 

The  correlation  coefficients  seuween  AKT  scores  and  the  performa_nce 
measures  indicate  that  the  AKT  did  rrot  correlate  significantly  with  X  (DIFF  x 
RATING),  r  (34)  =  .12;  EQUIP  ITEMS,  r  (34)  =  .03;  or  FOLLOW-UP,  r( 32)  = 
.03.    However,    the  AKT  did  correlate  significantly  with  TOTAL  TASKS,  H  ^•34^ 
.33,  p  <  .05.    This  pattern  of  correlations  indicates  that  AKT  scores  do  not 
predict  the  airman's  average  performance  level,  but  they  do  predict  what,  or 
how  many  different  things,  the  airman  can  do.    Therefore,  the  AKT  does 
appear  to  be  a  good  screening  instrument  for  determining  award  of  the  three- 
skill  level. 


Conclusions 

The  major  finding  of  this  study  is  that  the  bypassed  specialists  did 
about  as  well  as  -  even  slightly  better  than  -  the  technical  school 
graduates  on  all  performance  measures.    This  finding  holds  for  both  the 
raw  measures  and  the  measures  corrected  for  various  background  variables. 
The  implications  of  these  results  for  the  major  question  of  this  study  — 
where  to  set  the  AKT  passing  criterion  -  is  quite  clear:    The  criterion 
should  be  set  no  higher  than  the  tenth  percentile  of  technical  school 
graduates'  scores  on  the  AKT.    A  higher  cutoff  would  only  tend  to  block 
the  flow  of  qualified  three-level  airmen  into  the  career  field. 

The  question  of  the  effect  of  criterion  referencing  on  the  job 
performance  levels  of  bypassed  specialists  is  less  easily  answered. 
Certainly  the  higher  pass/fail  point  would  have  little  effect  in  this 
career  field.    However,  it  is  not  clear  that  this  finding  will  generalize 
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to  other  career  fields.    We  anticipate  that  the  effect  may  be  quite 
different  in  specialties  where  pre-service  experience  is  less  readily 
available  or  where  the  criterion-referenced  passing  score  is  farther 
from  the  current  passing  score.   This  issue  can  only  be  resolved  by 
additional  job  performance  studies  of  bypassed  specialties. 

We  feel  that  the  Occupational  Survey-based  job  performance  survey 
technique  will  prove  to  be  of  considerable  value  not  only  in  terms  of 
bypassed  specialist  performance  but  in  a  variety  of  other  situations 
as  well.    However,  our  immediate  concerns  are  with  further  performance 
studies  of  the  bypassed  specialist  population.    As  the  criterion-refer- 
encing system  goes  into  effect,  it  will  become  important  to  extend 
these  findings  to  other  career  fields.    The  present  data  indicate  no 
substantial  differences  between  bypassed  specialists  and  technical  school 
graduates.    If  these  findings  generalize  to  other  career  fields,  we  can 
accept  the  tenth  percentile  of  technical  school  graduates'  AKT  scores 
as  the  passing  point  for  the  AKT  with  considerable  confidence. 
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ANALYSIS  OF  HEAVY  EQUIPMENT  OPERATOR  JOBS 

) 

I.     PROBLEMS  AND  OBJECTIVES 

•This  report  concerns  a  project  known  as  the  TRAINING  STANDARDS  PROJECT 
(TSP),  conducted  for  the  past  two-and-a-half  years  under  the  auspices  and 
with  the  active  participation  of  the  International  Union  of  Operating  En- 
gineers (lUOE).*   The  union,  in  collaboration  with  contractors,  conducts  a 
national  apprentice  training  program  at  some  75  training  centers  operated 
by  local  unions  throughout  the  United  States. 

During  the  early  1970's,  class  action  suits  initiated  by  individuals 
were  brought  against  several  lUOE  locals  charging  racial  discrimination  in 
selection  for  apprenticeship  training.    Among  the  charges  were  that  the  re- 
quired high  school  diplomas,  language  and  mathematics  requirements  of  qual- 
ification tests,  and  the  length  (4  years)  of  apprenticeship  were  either  ir- 
relevant to  the  work  or  unnecessary  to  achieve  competence.    The  work  of  the 
operating  engineer,  it  was  charged,  was  much  simpler  than  was  claimed  by  the 
union,  was  so  classified  in  the  Dictionary  of  Occupational  Titles  of  the 
^nited  States  Employment  Service,  and  could  be  learned  in  a  much  shorter 
period  of  time. 

In  view  of  the  ■irAv-r^'j^c,  of  data  to  deal  with  these  charges,  the  union 
sought  and  obtained  :  research  to  establish  for  itself  and  for  the 

public  the  true  nature  of  i":^  rfork  and  the  skill  required.    In  undertaking 
this  research,  it  not  only  was  concerned  with  the  response  to  the  courts 
and  affirmative  action  in  the  area  of  equal  employment  opportunity,  but  also 
with  the  improvement  of  its  own  training  practices.    Included  in  its  ob- 
jectives for  the  Training  Standards  Project  were: 

•  To  define  the  work  of  the  operating  engineer  so  that  the 
knowledge,  skills  and  abilities  required  could  be  satis- 
factorily conmunicated  to  the  courts  and  the  public. 

•  To  establish  training  standards  for  every  important  op- 
erating engineering  task. 

•  To  provide  a  basis  for  more  objective  and  defensible 
apprentice  selection  procedures— namely,  tests. 

k      *A   Michael  Collins,  the  union's  present  monitor,  worked  very  closely 
I       with  ARRO  personnel,  particularly  in  arranging  and  managing  the  active 
participation  and  contribution  of  union  members. 
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In  the  initial  planning  of  the  project  around  these  objectives,  a 
number  of  needs  emerged  that  dictated  the  choice  of  methodology  and  procedure. 
They  were  as  follows: 

•  The  job  analysis  data  developed  needed  to  satisfy  the 
courts  as  to  the  level  of  knowledges,  skills,  and  abil- 
ities required,  but  also  needed  to 

•  define  performance  standards,  and 

•  training  required  to  meet  the  standards,  and  take  into  account 

•  regional  and  environmental  differences  in  performance. 
Finally,  the  analysis  had  to  produce 

•  content  valid  measures  that  did  not  result  in  discrimination 
against  particular  groups  of  people  in  our  society. 

This  report  will  focus  on  the  work  done  to  satisfy  these  needs  and  meet  the 
union's  objectives. 

II.    TECHNICAL  APPROACH 

Job  Analysis  Phase.    The  union  employed  S.  A.  Fine  Associates  to  design, 
manage  and  carry  out  the  research,  a  decision  made  to  some  degree  because  of 
their  interest  in  the  Functional  Job  Analysis  (FJA)*  methodology  used  by 
this  organization.    FJA  focuses  on  tasks  which  are  formulated  as  fundamental, 
stable  units  consisting  of  a  behavior  and  a  result.    These  tasks  are  organ- 
ized into  job  assignments  in  one  combination  or  another  to  accomplish  a  job 
of  work.    Data  for  preparing  task  statements  are  obtained  in  observation/ 
interviews.    In  addition  to  the  behavior  and  result,  the  task  statement  in- 
clwdes  information  about  the  resources  the  worker  draws  upon--the  machine 
tools  and  equipment  used  and  the  level  of  instructions,  that  is,  the  pre- 
scription/discretion mix  that  the  worker  must  follow.    The  accuracy  and  re- 
liability of  the  task  statement  are  controlled  by  10  ratings  on  ordinal 
scales  functionally  defined  that  establish  the  level  of  complexity  with  re- 
gard to  Things,  Data,  People,  Instructions,  Reasoning,  Mathematics,  and 
Language.    From  this  information  it  is  possible  to  directly  formulate  Per- 
formance Standards  and  Training  Requirements.    The  complete  task  analysis 
provides  the  information  to  fulfill  the  paradigm:    To  do  this  task  to  these 
standards,  the  worker  needs  this  training. 


*S.  A.  Fine  and  W.  W.  Wiley.    An  introduction  to  Functional  Job  Analysis: 
A  scaling  of  selected  tasks  from  the  welfare  field.  No,  4. 
The  W.  E.  Upjohn  Institute  for  Employment  Research,  September  1971. 

■   ":^9 
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^       Functional  Job  Analysis  was  used  to  develop  baseline  information  about 
^Pjperator  requirements  necessary  to  perform  the  tasks  and  produce  the  neces- 
sary outputs  that  are  within  the  capability  of  a  piece  of  equipment.  Job 
analysis  was  conducted  for  16  kinds  of  construction  equipment  normally  op- 
erated by  operating  engineers.    This  paper  deals  only  with  the  bulldozer, 
backhoe,  loader,  grader,  and  scraper,  in  the  so-called  blade  category  of  equip- 
ment.   To  illustrate, a  completed  task  statement  for  the  Grader  is  shown  in  Fig.  1. 

A  cadre  of  some  20  senior  operating  engineers,  engaged  in  apprenticeship 
training  and  experienced  across  the  full  range  of  the  jobs  being  analyzed, 
received  a  week  of  training  in  Functional  Job  Analysis  methods.    They,  then, 
took  on  assignments  as  individuals  to  serve  as  expert  consultants  to  the 
consulting  psychologists  on  one  or  another  piece  of  equipment.    Through  them 
arrangements  were  made  to  visit  training  and  job  sites  where  observation/ 
interviews  were  conducted.    Subsequently,  in  task  force  groups  of  .3  or  4, 
they  reviewed  the  various  drafts  of  the  task  analysis  for  accuracy,  coverage, 
and  communicability.    Task  statements  were  then  edited  and  made  consistent 
in  form  by  the  consulting  psychologist.    The  final  steo  in  the  job 
^knalysis  phase  was  the  assembly  of  the  total  group  of  20  FJA- trained  operators 
^^here  the  assembly  as  a  whole  reached  consensus  on  what  should  be  included 
for  each  item  of  equipment. 

Performance  Standards  Phase.    Performance  standards  for  the  operation 
of  a  piece  of  heavy  equipment  are  intended  to  describe  the  jobs  that  an  ex- 
perienced operator  should  be  capable  of  performing  with  that  machine.  The 
standards  are  cast  in  terms  of  specific  outputs  (types  of  results)  and  op- 
erator behaviors  required  to  accomplish  each  output  safely,  efficiently,  and 
effectively. 

The  job  analysis  for  an  item  of  equipment  is  usually  represented  in 
seven  task  statements  (seven  printed  pages): 

•  Inspects  the  equipment  (prior  to  operation) 

•  Services  the  equipment 
9  Starts  the  equipment 

•  Operates  the  equipment--basic  outputs 
0  Operates  the  equipment--intermediate  outputs 

•  Operates  the  equipment--dif ficul t  outputs 

•  Shuts  down  the  equipment. 
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TASK  COOC: 

mm  m  mmhm 


THIIKS 

r 

Mil. 


DATA 
TT 


\ 

2T 


PEOPLE 


t 

i7 


Operates  firader-Output  Basic 


MMEK 

INSTMICTIOIIS 


OSJECTIVE: 


6ENEKAL  Et 

wcATioMi  DEmoniEiir 

KEASONING 

NATH 

LANCUAfiE 

~3 

BackfillliKi,  scarifying,  NindroMiiH),  cutting 
firebreali,  laintalning  haul  road,  snoN  rauval 

the  perfomnce  of  the  eouliiK  t  ail  ad^K  h7  !  'u"  S^'^  ^^^^^^^  «»«^toring 
ence  and  safety  of  ot  erX/S£  K  S^?  constantly  alert  t^  the  pre . 
M  rt)ad  lalnL  ^    ''"^^"^  '''^ «  ^^Wtlllng, 


(To  ?eTf(mfhi8  Ml 


Operates  equipment  properly. 
Is  alert  and  attentive. 


•  All  wrk  meets  wrk  order/ 
requirements.  / 

-  No  accidents/damage  due'to 
iiproper  operating  tech- 
niques. / 

(fo  These  StwMs}^  


niNCTIONAL: 


SPECIFIC: 


-  HcM  to  operate  grader. 

Mwfilllng,  scarifying,  windrwing. 
cutting  firebreak,  maintajnlng 
road,  snoN  removal . 

\ 

-  Knowledge  of  specific  grader. 

■  Wedge  of  wrk  requirements. 

■  Knowledge  of  specific  job  site  (l.e., 

layout,  soil  condition,  enviwnment). 

'^Mef  keis  Mb  fining) 
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Figure!.  Illustrated  Task  Statement  for  the  grader 


The  performance  standards  are  detailed  expansions  of  the  standards  listed  in 
|the  task  statements,  exploring  behavioral  implementations  of  various  contin- 
gencies as  well  as  critical  "know-how''  developed  through  experience.  They 
run  from  100  to  150  pages  for  each  piece  of  equipment. 

The  standards  are  stated  in  terms  of  those  that  are  primarily  mental  in 
nature^  requiring  planning,  monitoring,  and  checking;  those  that  require  in- 
terpersonal relationships;  and  those  that  require  the  combination  of  percep- 
tion and  physical  coordination  to  accomplish  the  operations,  such  as  manip- 
ulating controls  and  operating  the  equipment  to  meet  work  specifications. 
For  example,  the  backhoe  output  of  "precision  excavating"*  is  described  in  24 
mental/planning/monitoring  standards,  7  interpersonal  standards,  and  43  phys- 
ical action  standards. 

The  process  for  developing  performance  standards  comprised  four  steps: 

-    The  psychological  consultant  prepared  a  preliminary 
draft  of  the  performance  standards  to  establish  a 
common  format  among  all  standards. 

The  preliminary  draft  was  reviewed  during  a  two-day 
meeting  of  the  consultant  and  a  subject  matter  expert 
for  that  piece  of  equipment. 

The  standards  were  revised,  incorporating  changes 
decided  on  in  the  previous  step,  and  resubmitted 
to  the  subject  matter  expert  for  approval. 

The  proposed  standards  were  reviewed  and  revised  by 
a  Task  Force  selected  for  that  piece  of  equipment. 
In  the  "task  force"  review  meeting  (requiring  two 
days),  each  output  was  discussed,  one-by-one,  and 
decision  reached  as  to  proper  wording  and  description 
of  each  performance  within  the  output. 

Performance  standards  task  forces  of  4  to  6  subject  matter  experts  were  formed 
for  each  piece  of  equipment  in  the  project.    Task  force  members  were  selected 
to  be  geographically  representative  of  operating  engineers  nationwide,  so 
variations  in  operating  practice  as  a  function  of  climate,  region  of  the  na- 
tion, equipment  model  preferences,  and  so  on,  are  taken  into  account. 

Test  Development  Phase.  Work  sample  perfncrmance  tests  were  developed 
directly  from  the  performance  standards.  Those  tasks  that  were  mast  often 
performed  were  chosen  on  the  advice  of  the  task  force  subject  matter  experts. 


•*The  fully  qualified  backhoe  operator  should  be  able  to  perform  eight 

outputs  (in  addition  to  the  common  outputs  of  inspection/servicina,  and 

start-up/shut-down:    (1)  compacting  with  a  vibratory  attachment,  (2) 

loading  a  haul  vehicle,  (3)  removing  trees  and  stumps,  (4)  pavement 

breaking,  (5)  filling  and  backfilling,  (6)  hoisting,  (7)  plrccing  riprap, 
and  (8)  precision  excavating. 
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Most  test  layouts  consisted  of  a  set  of  formalized  work  requirements.  The 
operator  being  examined  received  instructions  much  in  the  same  way  that  a 
job  foreman  would  issue  them.    The  operator  then  read  the  grade  stakes  and 
performed  the  earth  moving  necessary  to  meet  the  job  specifications.  Test 
situations  varied  from  1  1/2  to  3  hours.    The  tasks  making  up  the  work  samples 
are  shown  in  Table  1 . 

Performance  on  the  work  sample  was  timed,  and  measured  by  a  series  of 
test  items  drawn  nearly  verbatim  from  the  performance  standards.    Items  are 
statements  covering  three  general  areas  of  equipment  operation:    skill  in  op- 
erating the  equipment,  the  safety  behavior  and  practices  demonstrated,  and 
the  extent  to  which  the  job  specifications  were  met.    The  items  are  arranged 
in  checklist  format  with  space  for  (a)  simple  Yes/No  checks  to  indicate 
whether  or  not  the  behavior  was  observed,  and  (b)  ratings  on  a  5-point  scale 
(1  =  poor,  5  =  superior)  for  overall  performance  and  the  satisfying  of  task 
specifications.    There  are  generally  about  20  items  of  each  type  for  each 
output  of  the  test.    Each  test  was  tried  out  before  use  to  assure  the  correct 
time  allocation,  sufficiency  of  the  instructions,  appropriateness  of  the  tasks 
performed,  and  the  ease  with  which  the  test  could  be  used. 

Test  Administration.    Validation  of  the  work  sample  performance  tests 
has  been  conducted  with  the  same  care  and  attention  to  detail  exercised  in 
the  development  of  performance  standards.    The  locations  of  each  of  the  week- 
long  validation  testings  is  shown  in  Table  2.    Again  test  sites  were  chosen 
to  be  geographically  dispersed  across  the  nation.    The  bulldozer  test  valida- 
tion was  the  first  conducted  with  testing  of  28-32  operators  at  each  of  four 
locations.    It  later  was  decided  that  more  subject  operators  could  be  tested 
at  fewer  locations  without  sacrifice  to  the  integrity  of  the  validation.  For 
all  subsequent  validation  testing,  36-42  operators  were  tested  at  each  loca- 
tion.   In  all,  360  operators  made  up  the  validation  sample. 

The  validation  strategy  was  to  test  operators  of  prejudged,  known  skill 
levels, with  test  administration  and  scoring  by  subject  matter  experts  (in  the 
study,  the  test  administrators  are  called  "observers")  who  have  no  knowledge 
of  the  prejudged,  known  skill  levels  of  the  operators.    Then,  if  a  test  dif- 
ferentiates as  presumed,  the  most  highly  skilled  operators  will  perform  bet- 
ter on  the  test  than  the  average,  who  in  turn  will  perform  better  than  the 
least  skilled.    It  is  obvious  that  the  operator  selection  beforehand  is  the 
key  to  a  "valid"  validation.  7.^j 
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TABLE  1 

Outputs  Tested  for  Each  Piece  of  Equipment 


Bulldozer  (3  hours) 


Backhoe  (2  hours) 


Loader  (1  1/2  hours) 
Grader  (2  1/2  hours) 


Scraper  (Varied) 


Excavate  for  foundation,  backfill 
Finish  a  slope 
Pushload  scraper,  run  fill 
Cut  and  fill,  build  ramp 
Build  bench 

Excavate  vertical  wall  trench 
Expose  buried  pipe 
Excavate  sloping  wall  trench 
Excavate  pier  hole 

Excavate  basement 
Form  spoil  pile 

Load  haul  vehicle  from  stockpile 

Build  maintenance  road 

Cut  rough  ditches 

Level  material  and  crown  road 
Construct  V-ditch  to  grade 
Finish  grade  to  a  flat  surface 

Load  scraper 

Haul  material  to  fill  area 
Unload  scraper 
Return  to  cut  area 


TABLE  : 

Locations  and  Dates  of  Wrfidation  Testing 


Equipment 

Date 

Place 

Bulldozer 

May  1977 

Cleveland,  Ohio 

Bulldozer 

May  1977 

Philadelphia,  Pa. 

Bulldozer 

May  1977 

Des  Moines,  Iowa 

Bulldozer 

May  1977 

Sacramento,  Calif. 

Backhoe 

May  1978 

Dayton,  N.  J. 

Loader 

June  1978 

Beaumont,  Calif. 

Loader 

July  1978 

New  Alexandria,  Pa. 

Grader 

June  1978 

Columbus,  Ohio 

Scraper 

July  1978 

Richmondville,  N.  Y. 

Scraper 

August  197^ 

Seattle,  Wash. 

^       The  idealized  quota  of  subject  operators  at  each  location  was  32  opera- 
^.tors  for  bulldozer  testing  and  40  operators  for  the  other  kinds  of  equip- 
ment, distributed  as  follows:  Other 

Bulldozer  Equipment 


Level  3  Journeyman  Operators  8  10 

Level  2  Journeyman  Operators  8  10 

Level  1  Journeyman  Operators  8  10 

Senior  or  Newly  Graduated  Apprentices  8   1£ 


32  40 


The  description  of  each  category  is  shown  in  Table  3,    In  addition  to  the 
quota  by  skill  level,  operator  selection  committees  were  instructed  to  se- 
lect, if  possible,  without  compromising  skill  level  representation,  30  per- 
cent minority.*   Minority  representation,  in  actuality,  turned  out  to  be  90 
out  of  360  or  25  percent. 

Operator  selection  committees  were  formed  at  each  test  location.  Com- 
mittee members  included  Operating  Engineer  Apprenticeship  Coordinators,  some 
contractors,  and  business  agents,  and  dispatchers  for  the  local  union.  The 
^ ski  11  category  classification  of  the  operators  selected  to  participate  was 
Bheld  strictly  confidential,  with  no  one  other  than  the  selection  committee 
and  psychologist  in  charge  knowing  these  classifications, 

IIL  RESULTS 

The  data  analysis  and  results  for  each  test  focused  on  the  following  im- 

r ^  

Does  the  test  do  what  it  is  supposed  to  do?    is  it  valid? 

Itt  considering  recent  EEOC  Guidelines,  what  influence  does 
certain  operator  characteristics,  specifically  race,  have 
on  test  performance  and  validity? 

The  analysis  was  designed  to  determine  whether  the  tests  in  whole  and  in 
part  differentiate  between  the  four  criterion  groups;  and  if  racial  member- 
ship creates  differences  in  test  performance  and  test  validity. 

To  address  these  issues  and  to  answer  related  questions,  three  areas 
were  explored:    (1)  the  statistical  organization  of  the  tests,  (2)  the  valid- 
ity of  each  test,  and  (3)  the  differences  between  white  and  minority  operators' 
test  performance, 

^ — *As  distinguished  from  the  nonminority,  "White,"  operators,  the  Minority 
B     operators  include:    Black  (origins  in  Black  African  racial  groups),  His- 
panic  (Mexican,  Puerto  Rican,  Cuban,  Central  or  South  American  or  other 
Spanish  culture),  Asian  or  Pacific  Islander,  American  Indian,  and 
Alaskan  Nstive* 
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Operator  Selection  Criteria 

Level  3  Journeyman— This  class  of  operator  is  the  most  expert  of  all  on 
that  particular  piece  of  equipment  and  is  considered  to  be  a  "top  hand." 
Such  an  operator  can  work  to  specifications  essentially  on  his  or  her  own, 
and  usually  can  perform  all  of  the  outputs  of  which  the  machine  is  capable; 
most  significantly,  the  Level  3  Operator's  work  is  likely  to  never  need 
follow-up  by  another  operator. 

Level  2  Journeyman—This  class  of  operator  is  the  broad  class  of  "average" 
operators.    These  operators  may  be  skillful  in  .,ome  outputs,  but  never  in 
all.    This  class  of  operator  usually  can  manage  on  his  or  her  own  without 
much  supervision.    However,  the  Level  2  operator's  work  occasionally  will 
need  follow-up  by  a  more  skilled  operator. 

Level  1  Journeyman--This  class  of  operator  may  not  have  had  experience  on 
all  of  the  outputs  that  a  machine  is  capable  of  performing,  or  may,  despite 
experience,  lack  the  skills  to  perform  the  outputs  well;  the  operator  needs 
a  lot  of  supervision.    What  is  most  characteristic  is  that  the  operator's 
work  often  will  not  meet,  exactly,  performance  criteria  or  output  specifica- 
tions; most  significantly,  the  Level  1  operator's  work  will  often  need  follow- 
up  by  a  more  ski led  operator. 

Apprentice— The  apprentice  could  fall  within  any  of  the  three  journeyman  skill 
levels.    Most  likely,  however,  since  apprentices*  experience  usually  is  lim- 
ited, the  apprentice  skill  level  will  be  at  Journeyman  Level  1  or  lower.  All 
that  is  required  for  the  performance  checklist  validation  is  that  the  ap- 
prentices participating  (a)  be  in  their  third  or  fourth  years  of  apprenticeship 
training  (or  recent  graduates),  and  (b)  have  had  some  training  on  the  piece 
of  equipment  being  tested.  ^  > 
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Organization  of  the  Performance  Tests.    To  search  for  conmonality  among 
test  Items,  a  principal  components  analysis  was  performed  on  three  of  the 
perfonnance  tests  (I.e.,  loader,  backhoe,  and  grader).    The  analyses  were 
followed  by  several  orthogonal  rotations  and  an  oblique  rotation.    Of  these, 
the  Varlmax  method  provided  the  most  simplified  and  meaningful  factorial 
structures. 

With  some  slight  variation,  the  analyses  yielded  three  components  of 
heavy  equipment  operations  that  were  relatively  stable  across  the  three 
pieces  of  equipment.    The  components  were:    using  correct  procedure  and 
meeting  specifications,  operating  with  caution  and  safety,  and  following 
instructions.    For  example,  in  using  the  correct  procedures  and  meeting  spe- 
cifications, the  expert  loader  operator  manipulates  controls  with  precision  and 
smoothness,  travels  foni^ard  into  material,  tilts  bucket  to  aid  breakout, 
fills  bucket  without  straining  the  engine,  empties  bucket  at  45  degree  angle, 
and  obtains  uniform  grade.    The  expert  operator  also  functions  with  caution 
and  safety,  such  as,  not  abusing  equipment  and  following  safety  rules.  To 

^follow  instructions,  he/she  comprehends  instructions  from  supervision  and 

^^iterprets  grade  stakes  properly. 

, Criterion-related  Validity.    The  item  analysis  for  each  of  the  five 
performance  tests  Included  the  usual  statistics,  such  as,  means,  standard 
deviations,  and  frequency  distributions.    The  discrimination  index  was  also 
calculated  for  each  item  to  detemine  If  the  item  indicated  differences 
between  levels  of  operator  skill. 

Table  4  Illustrates  an  item  from  the  grader  test  that  significantly  dif- 
ferentiated between  the  four  criterion  groups.    In  other  words,  the  operators 
who  rated  high  on  the  item  (i.e.,  the  operator  manipulated  the  controls  with 
precision  and  smoothness)  were  in  fact  the  partioipants  Identified  by  the 
selection  committee  as  expert  and  above  average  operators,  while  the  par- 
ticipants who  were  rated  lower  on  the  item  had  been  classified  as  below 
average  operators  and  senior  apprentices. 

The  item  analyses  Indicated  that,  overall,  about  80  percent  of  the  test 
items  significantly  differentiated  between  the  levels  of  operator  skill. 
~|e  results  demonstrate  that  a  majority  of  the  test  contents  are  valid  and 
trefore  indicative  of  the  operator's  level  of  competence. 
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TABLE  4 

Example  of  a  Discriminating  Question 

Question:   Old  the  grader  operator  manipulate  controls  with  precision 
and  smoothness? 


Observer  Ratings  Total 

No. 


1 

2 

3 

4 

5 

of 

Oper. 

Aver. 
Rating 

3 

] 

1 

11 

4.77 

2 

1 

1 

3 

3 

8 

3.87 

1 

3 

3 

5 

2 

13 

2.46 

App. 

2 

2 

1 

V 

2.80 

Totals 

4 

5 

9 

7 

14  > 

/  39 

p  "  .001  (highly  statistically  significant) 


K  - 


•  * 

i  3 

u 

*  z 

1 

1 

i 

i 

3 

2 

1 

App. 

Skill  Level 
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Since  the  overall  rating  of  operator  performance  was  significantly 
correlated  with  the  remaining  items  in  each  test,  it  was  used  to  demonstrate 
the  overall  validity  of  the  tests.    Figure  2  clearly  demonstrates  the  sig- 
nificant relationships  between  test  performance  and  skill  level. 

Of  further  importance,  findings  for  the  loader  performance  test  demon- 
strate the  uniqueness  of  loader  operations.    The  principal  components  anal- 
ysis yielded  three  dimensions  similar  to  the  other  equipment  (e.g.,  grader 
and  backhoe);  however,  in  contrast  to  other  equipment,  efficiency  of  per- 
formance and  economy  of  effort  (e.g.,  operator  avoids  excessive  turning  and 
traveling)  emerged  as  salient  aspects  of  loader  operations. 

During  the  development  of  the  "loader  performance  standards  and  tests, 
it  became  evident  that  an  effective  indicator  of  efficiency  is  the  oper- 
ator's cycle  time  (i.e.,  the  period  from  when  the  operator  initially  tilts 
the  bucket  to  empty  the  material  until  he/she  begins  to  tilt  the  bucket  to 
dump  the  next  load  of  materials).    Because  of  its  critical  nature,  cycle 
time  was  included  as  an  item  and  recorded  for  each  operator  during  the  test. 
Figure  3  illustrates  again  the  validity  of  the  loader  test.    Those  par- 
ticipants who  had  previously  been  classified  as  below  average  operators  and 
apprentices  had  cycle  times  of  about  60  seconds,  while  the  participants  who 
were  previously  classified  as  expert  operators  had  significantly  shorter 
cycle  times  of  about  37  seconds.    The  expert  loader  operators  demonstrated 
 i.  -rri-.;^^^.,  -,^A  n,^nnnm\/  nf  a-f-fnvi-  in  their  D^i^f '^rmancs  Hijrino  the 

test  than  the  below  average  operators  and  apprentices. 

To  summarize,  the  perfoniiance-based  tests  for  the  five  pieces  of  equip- 
ment are  doing  what  they  were  intended  to  do,  i.e.,  differentiating  between 
levels  of  operator  slrill.    The  results  demonstrate  criterion-related  validity. 

Differences  in  Test  Performance  and  Validity  for  White  Operators  and 
Minority  Operators.    The  next  phase  of  the  analysis  involved  separating 
the  total  sample  of  participants  for  all  pieces  of  equipment  into  white  op- 
erators (N  =  290)  and  minority  operators  (N  =  70).    Figure  4  illustrates 
the  results. 

Even  though  there  are  slight  differences  in  test  performance  between 
white  and  minority  operators  at  each  skill  level,  the  important  point  il- 
lustrated by  the  figure  is  that  as  skill  level  increases  for  both  white  and 
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Figure  4.    Average  performance  for  all. five  tests 
for  minority  (M)  and  white  (W)  operators 
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minority  operators,  so  does  their  test  performance.    In  other  words,  the 
tests  are  valid  for  both  groups  of  operators,  although  at  somewhat  different 
levels  of  performance. 

To  further  investigate  the  differences  in  test  performance  between  the 
white  and  minority  groups,  we  looked  at  differences  in  education,  age,  and 
experience-    In  contrast  with  age  and  education,  the  difference  between  the 
number  of  years  of  experience  in  heavy  equipment  operations  between  white 
and  minority  participants  was  highly  significant.    It  appears  that  these 
differences  in  experience  between  whites  and  minorities  partially  explains 
the  differences  in  test  performance-    In  other  words,  the  minority  oper- 
ators have  less  experience  with  heavy  equipment  operations  than  do  white 
operators.    Consequently,  as  a  group,  they  have  acquired  fewer  of  the  neces- 
sary operating  skills  and  thus  perform  less  well  on  the  tests . 

In  summary,  while  there  were  differences  in  test  performance  between 
white  and  minority  operators,  the  test  performance  for  both  racial  groups 
^increased  significantly  as  their  skill  level  increased.    The  tests  are 
^ valid  for  both  white  and  minority  operators. 

IV.  CONCLUSIONS 

The  conclusions  we  have  drawn  from  the  data  presented  can  be  grouped 
under  three  headings-Content,  Criterion,  and  Construct  validity-topics 
of  central  concern  in  the  EEOC  Guidelines. 

Content  Validity.    The  tests  are  content  valid.    The  items  on  the 
tests  are  performance  based,  reflect  representative  outputs  for  each  piece 
of  equipment,  and  have  been  selected  by  subject  matter  experts  of  the  craft. 
Although  not  yet  tested  in  the  courts,  there  seems  little  doubt  that  the 
tests  will  meet  the  guideline  requirements  as  fair  measures  for  qualify- 
ing apprentices  as  journeymen. 

The  performance  tests  proved  to  be  a  positive,  satisfying  learning 
experience  for  observers  and  operators  at  all  levels.    A  typical  post-test 
operator  reaction  was:    "What  a  great  way  to  check  myself  out."  Observers, 
several  of  whom  were  instructors,  felt  the  tests  could  serve  as  effective 
checklists  for  union  instructors.    Because  of  these  kinds  of  reactions  and 
the  demonstrated  validity,  the  union  has  begun  the  production  of  slide/tape 
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training  films  for  apprentices  using  the  same  performance  standards 
as  the  basic  source  for  script  and  associated  curriculum  materials. 

Criterion  Validity.    Using  the  "known  group"  technique  for  establish- 
ing criterion  groups,  in  a  concurrent  validity  frame  of  reference,  it  is 
clear  that  the  tests  have  criterion  validity.    By  and  large,  the  tests  sig- 
nificantly discriminate  between  skill  levels  among  operating  engineers  in 
which  the  significant  criterion  factor  is  the  degree  of  independence  and 
autonomy  the  operator  can  be  permitted  in  doing  the  work.    This  is  of 
vital  importance  in  a  trade  in  which  the  operator  is  entrusted  with  ex- 
tremely expensive  machines  and  charged  with  accomplishment  of  work  basic  to 
both  the  success  of  a  construction  project  and  the  safety  of  many  other 
workers . 

Construct  Validity.    The  factor  analyses  indicate  the  possibility  of 
three  constructs  pertinent  to  the  functional  and  specific  content  of  heavy 
equipment  operation.    The  constructs  are  the  three  components  yielded  by  prin- 
cipal components  analysis,  namely: 

•  Using  correct  machine  operating  procedures  and  meeting 
output  specifications. 

•  Following  instructions  (these  are  in  good  measure 
operator  initiated). 

•  Operating  with  caution  and  safety. 

The  three  constructs  correlate  closely  with  the  Things,  Data,  People  categor- 
ies for  the  generation  of  the  performance  standards. 

It  is  noteworthy  that  the  data  and  people  relationships  of  operating  en- 
gineers, usually  overlooked  in  standard  descriptions,  were  uncovered  and  artic- 
ulated by  the  use  of  Functional  Job  Analysis,  as  significantly  involved  in  thei 
work.    The  Things,  Data,  People  aspects  of  the  components  need  to  be  further 
researched  as  a  basis  for  developing  aptitude  tests  for  entry  applicants. 
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PREDICTIVE  UTILITY  OF  THE  OFFICER 
EVALUATION  BATTERY  (OEB) 


Arthur  C.  F.  Gilbert,  Ph.D. 
U.  S.  Army  Research  Institute  for^the 
Behavioral  and  Social  Sciences 
Alexandria,  Virginia  22333 

The  Officer  Evaluation  Battery  (OEB)  was  designed  to  measure  a  number  of 
dimensions  predictive  of  success  as  an  Army  officer.    The  Officer 
Evaluation  Battery  is  essentially  the  same  test  battery  as  the  Cadet 
Evaluation  Battery  (CEB) ,  the  development  of  which  is  discussed  by  Mohr  and 
Rumsey  (1978).    The  only  difference  between  the  two  instruments  is  in  terms 
of  purpose  of  administration.    The  CEB  is  administered  to  cadets  in  the 
Army  Reserve  Officers  Training  Corps  (ROT.C)  for  selection  and/or  counseling 
purposes  while  the  OEB  is  for  administration  to  newly  commissioned  officers 
for  experimental  purposes. 

The  Officer  Evaluation  Battery  consists  of  cognitive  and  non-cognitive 
subtests.  The  seven  subtests  are  Combat  Leadership  (Cognitive),  Technical 
Managerial  Leadership  (Cognitive),  Career  Potential  (Cognitive), 
Combat  Leadership  (Non-Cognitive) ,  Technical  Managerial  Leadership  (Non- 
Cognitive)  ,  Career  Potential  (Non-Cognitive),  and  Career  Intent  (a  non- 
cognitive  scale) .  The  seven  subtests  of  the  OEB  and  the  types  of  items 
in  each  are  shown  in  Table  1.  Earlier  research  (Helme,  Willemin,  and 
Grafton,  1974)  indicated  the  utility  of  the  OEB  item  content  in  predicting 
success  in  a  simulated  combat  situation. 

The  purpose  of  this  research  was  to  evaluate  the  predictive  utility  of  th 
OEB  in  Officer  Basic  Courses  (OBC) .    An  Officer  Basic  Course  exists  for  each 
the  13  Career  Branches  in  the  Army.    A  newly  commissioned  officer  attends 
one  of  these  courses  on  entering  upon  active  duty  prior  to  his  initial  duty 
assignment.     The  research  focussed  on  evaluating  the  predictive 
effectiveness  of  each  of  the  subtests  and  the  combination  of  subtests  in 
relation  to  Officer  Basic  Course  final  grades  for  the  total  sample  and  within 
the  three  different  types  of  Officer  Basic  Courses:     Combat  Arms,  Combat 
Support,  and  Service  Support.    Another  purpose  of  the  research  was  to 
determine  if  there  were  differences  in  prediction  for  males  and  for 
females  and  to  evaluate  possible  differences  in  prediction  for  black 
officers  and  for  white  officers. 


The  views  expressed  in  this  paper  are  those  of  the  author  and  do  not 
necessarily  reflect  the  view  of  the  U.  S.  Army  Research  Institute  or  the 
Department  of  the  Army. 
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Table  1 


Officer  Evaluation  Battery  (OEB)  Subtests  and  Description  of  Items 


SUBTEST 

DESCRIPTION  OF  ITEMS 

Combat  Leadership  (Cognitive) 

Military  tactics;  practical  skills 
In  a  variety  of  areas  ranging  from 
out-door  activities  to  mechanical 
and  electronic  applications. 

Technical-Managerial  Leadership 
(Cognitive) 

History,  politics;  culture;  mathe- 
matics; physical  sciences 

Career  Potential  (Cognitive) 

Technological  knowledge  relevant 
to  military  requirements. 

Combat  Leadership  (Non-Cognltlve) 

Combat  leader  qualities,  occupational 
Interests,  sports  Interest,  outdoor 
Interests  related  to  combat  leader- 
ship 

Technical-Managerial  Leadership 
^won— irfOgniuivey 

Mathematics  and  Dhvslcal  sciences 
skills  and  Interest;  urban  or  rural 
background;  scientific  Interest  and 
ability;  decisive  leader  qualities; 
and  verbal-social  leadership 

Career  Potential 

Clerical-administrative  Interest , 
versus  white  collar  Interest,  com- 
bat Interest 

Career  Intent 

Intention  of  making  the  Array  a  career 
choice 
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Procedure 


The  Officer  Evaluation  Battery  was  administered  to  all  officers  In 
the  13  Career  Branches  who  attended  Officer  Basic  Courses  during  Fiscal  Year 
1974.     Final  OBC  course  grades  (I.e.,  the  criterion  measure)  were 
collected  from  each  OBC  for  as  many  subjects  as  possible. 

The  Initial  sample  consisted  of  9,180  officers  but  this  sample  Included 
many  officers  who  entered  on  active  duty  for  training  only  and  who  did 
not  enter  a  duty  assignment  on  completion  of  the  Officer  Basic  Course. 
Since  It  was  felt  desirable  to  keep  the  validation  sample  as  homogeneous 
as  possible,  only  those  officers  who  continued  on  In  an  active  duty 
status  after  Officer  Basic  Course  were  retained  In  the  validation  sample. 
A  total  of  4,622  were  so  Identified.    However,  for  some  of  these  officers, 
complete  data  (I.e.  OEB  scores  and  OBC  final  course  grades)  were  not  avail- 
able* 

Final  course  grades  were  reported  by  the  different  schools  as  either 
percentage  grades  or  as  class  standings  within  the  OBC;  In  some  cases 
both  percentage  grades  and  class  standings  were  reported.    When  only 
grades  were  reported,  they  were  rank  ordered  and  a  class  standing  gene- 
rated for  each  student.    The  resulting  class  standing  were  converted  to 
Arm>r  standard  scores.    Where  class  standing  was  available.  It  was  converted 
directly  to  an  Army  standard  score. 

A  multiple  regression  analysis  was  performed  using  all  seven  subtests 
of  the  OEB  as  predictors  with  final  OBC  grades  as  the  criterion  for  the 
total  sample.     The  Intercorrelatlon  matrix  was  computed  using  palrwlse 
deletion  of  missing  values  In  the  data  matrix.    The  total  sample  was 
divided  on  the  basis  of  type  of  Officer  Basic  Course  (i.e..  Combat  Arms, 
Combat  Support,  and  Service  Support).    The  total  sample  was  also  divided 
on  the  basis  of  sex  and  finally  on  the  basis  of  race  (i-e.,  black  officers 
and  white  officers).     Since  these  breakdowns  could  only  be  accomplished 
where  classification  data  were  available,  the  number  in  the  subgroups  will 
not  always  equal  the  total  number  of  cases.    Parallel  analyses  were  then 
performed  for  each  of  the  subgroups  (i.e.,  seven  separate  analyses). 

Results  and  Discussion 

The  correlations  between  each  of  the  OEB  subtests  and  Officer  Basic 
Course  final  course  grades  are  shown  in  Table  2  for  the  total  sample. 
The  multiple  correlation  of  all  seven  subtest  scores  with  the  criterion 
are  also  presented.    The  same  data  are  presented  for  each  of  the  seven 
analyses  in  this  table. 


756 


Table  2 


Correlations  Between  Each  Officer  Evaluation 
Battery  (OEB)  Subtest  and  Officer  Basic 
Course  Final  Grades  for  the  Total 
Saiple  and  for  Each  Subsaiple 


Total 

Combat 
Arms 
(N=l,536) 

Combat 
Support 
(N=  903) 

Senice 
Support 
{N=  397) 

Male 
Sample 
(N=2,719) 

Female 
Sample 
N=  113) 

Black 
Sample 
(N=  190) 

White 

Sample 

(N=2,603) 

Combat  Leadership 
(Cognitive) 

.36** 

.31** 

.43** 

.33** 

.36** 

.42** 

.41** 

.29** 

Technical  Man-  . 
agerial  (Cognitive) 

.29** 

.20** 

.29** 

.37** 

.29** 

.33** 

.29** 

.22** 

Career  Potential 
(Cognitive) 

.32** 

.26** 

.40** 

.36** 

.32** 

.33** 

.27** 

.26** 

Combat  Leader- 
ship (Non-Cognitiv^ 

.16** 

.19** 

.14** 

.01 

.16** 

-.01 

.27** 

.14** 

Technical  Man- 

agerialdlon- 

Cognitive) 

.28** 

.17** 

.19** 

.17** 

.22** 

.22** 

.23** 

.17** 

Career  Potential 
(Non-Cognitive) 

.12** 

.17** 

.08** 

-.08 

.13** 

-.15 

.22** 

.07* 

Career  Intent 

.09** 

.15** 

.03 

-.08 

.09** 

.16 

.16** 

.10** 

Multiple 
Correlation 

.42** 

.38** 

.49** 

.47** 

.41** 

.55** 

.49** 

,34** 

*  Significant  at  the  .05  level. 
**  Significant  at  the  .01  level. 
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All  zero  order  correlations  between  each  of  the  subtest  scores  and 
OBC  final  course  grades  were  significant  at  the  .01  level  with  the  excep- 
tion of  the  Career  Intent  subtest  which  yielded  a  low  positive  correla- 
tion of  .09  with  the  criterion  that  was  significant  only  at  the  .05  level. 
The  multiple  correlation  of  the  seven  subtest  scores  with  the  criterion  of 
.42  was  significant  at  the  .01  level  for  the  total  sample. 

In  the  Combat  Arms  branches,  all  cf  the  seven  subtests  yielded  zero 
order  correlations  with  the  criterion  that  were  significant  at  the  .01 
level  and  a  multiple  correlation  of  .38.     Six  of  the  subtests  yielded 
zero  order  correlations  significant  at  the  .01  level  in  the  Combat 
Support  branches;  the  only  exception  was  the  correlation  between  the 
Career  Intent  subtests  and  the  criterion.    A  multiple  correlation  of  .49 
was  obtained  between  the  seven  subtests  and  the  criterion  for  this 
sample.    All  three  of  the  OEB  cognitive  subtests  were  significantly  cor- 
related with  the  criterion  in  the  Service  Support  branches  (at  the  .01 
level)  as  well  as  the  Technical  Managerial  (Non-Cognitive)  subtest. 
Small  negative  or  negligible  correlations  were  derived  for  the  remaining 
three  scales  with  the  criterion-    The  multiple  correlation  for  this  sub- 
sample  was  .47. 

A  test  for  the  significance  of  differences  was  performed  among  the 
sets  of  zero  order  correlations  for  the  three  types  of  branches  as  well 
as  test  of  significance  of  the  difference  among  the  multiple  correla- 
tions.   The  three  cognitive  scales  of  the  OEB  yielded  significantly  lower 
correlations  (p  less  than  .01)  in  the  Combat  Arms  branches  than  in  the  Combat 
Support  branches.     There  was  not  any  significant  difference  between  the 
Combat  Support  branches  and  the  Service  Support  branches  in  terms  of  the 
predictive  effectiveness  of  these  three  subtests.    The  Career  Potential 
(Non-Cognitive)  scale  yielded  a  higher  (significant  at  the  .05  level) 
zero  order  correlation  with  the  criterion  for  the  Combat  Arms  branches 
than  for  the  Combat  Support  branches.     The  Career  Intent  subtest  also  yielded 
a  significantly  greater  correlation  with  the  criterion  in  the  Combat  Arms 
branches  than  in  the  Combat  Support;  branches;  the  difference  was  significant 
at  the  .01  level.     The  multiple  correlation  of  .38  for  the  Combat  Arms 
branches  was  significantly  lower,  at  the  .01  level,  than  the  multiple  cor- 
relation obtained  in  the  Combat  Support  branches. 

For  the  male  sample  all  seven  subtests  yielded  zero  order  correlations 
with  the  criterion  that  were  significant  at  the  .01  level.     The  multiple 
correlation  was  .41  for  this  sample.     In  the  female  sample,  four  sub- 
tests, the  three  cognitive  subtests  and  the  Technical  Managerial  (Non- 
Cognitive)  subtest,  yielded  zero  order  correlations  with  the  criterion 
that  were  significant  at  the  .01  level.     The  corresponding  multiple  cor- 
relation for  the  sample  was  .55.    There  were  not  any  significant  differences 
between  the  zero  order  correlations  for  the  two  samples  and  there  was  not 
any  difference  between  the  two  multiple  correlation  coefficients. 


S 


758 


The  zero  order  correlations  between  each  of  the  OEB  subtests  and  the 
criterion,  as  well  as  the  resulting  multiple  correlation  coefficient  of  .49, 
were  all  significantly  different  from  zero  at  the  .01  level  for  the  sample 
of  black  officers.    For  the  sample  of  white  officers  all  of  the  zero  order 
correlations  with  the  criterion  were  significant  at  the  .01  level  with  the 
exception  of  the  Career  Potential  (Non--Cognitive)  subtest  that  yielded  a 
correlation  of  .07  with  the  criterion  that  was  significant  at  the  .05 
level.    The  resulting  multiple  correlation  of  .34  for  the  sample  of  white 
officers  was  significant  at  the  .01  level. 

The  results  of  this  research  indicate  that  the  Officer  Evaluation 
Battery  (OEB)  is  a  useful  predictor  of  final  course  grades  in  the  Officers 
Basic  course.     Some  fluctuations  occur  in  the  different  samples  but  these 
are  probably  a  function  of  varying  sample  sizes  and  sample  characteristics. 
Generally,  the  OEB  appears  to  have  utility  in  predicting  the  performance 
of  junior  officers  in  acquiring  skills  and  knowledges  necessary  in  their 
performance  as  Army  officers. 
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ASSESSMENT  CENTER  VARUBLKS  AS  PREDfCTORS 
OF  ON- JOB  PERFOR>L\NCE  CHAKACTERIS-aCS 

A  set  of  eight  hands-on  tests  and  a  semi-structured  inter- 
view administered  in  an  assessment  center  was  developed  by 
Sicgel  and  Wiesen  to  supplement  the  ASVAB  for  selection  and 
assignment  of  General  Detail  personnel  in  the  Navy.    After  an 
experimental  administration  of  the  battery  to  140  male  enlisted 
personnel,  follow-up  was  carried  out  in  the  Fleet  to  validate 
the  tests  against  four  supervisory  ratings  of  on-job  performance. 

In  general  assessment  center  variables  when  used  separately 
had  about  the  same  predict iveness  for  job  performance  criteria 
as  the  ASVAB,  or  whe'^  used  in  conjunction  with  ASVA3  increased 
the  shrunken  multiple  regression  coefficients  from  .05  to  .32, 
The  tests  which  were  useful  for  supplementing  ASVAB  were  the 
Semi-structured  interview  and  measures  of  attentional  time- 
sharing and  coordinative  speed  and  accuracy.    Shrunken  multiple 
validity  coefficients  of  batteries  composed  of  'the  five  most 
predictive  operational  and  assessment  center  vari?ibles  ranged 
from  .38  to  .75  for  the  four  supervisory  ratings  of  on- job 
performance. 

Attenuation  of  the  validity  coefficients  for  the  unreli- 
ability in  supervisors'  marks  substantially  increased  the 
shrunken  multiple  regression  coefficients.    The  findings 
Suggest  that  when  compensation  is  made  for  the  substantial 
inherent  unreliability  in  supervisors'  marks,  predictive 
Validities  of  optimally  selected  batteries  of  written  tests 
and  assessment  center  variables  account  for  most  of  the 
reliable  variance  in  supervisors'  ratings  of  GDPs. 

Paper-and-pencil  tests  used  for  selection  of  personnel  for  unskilled 
and  semi-skilled  labor  and  for  trades  jobs  h'dve  frequently  been  found  to 
have  low  predictive  validities  for  criteria  of  on-job  performance. 
Anderson,  Rousch,  and  McClary  (1973)  in  a  study  of  coil  winders  found 
that  none  of  the  paper-and-pencil  tests  of  the  GATB  correlated  signifi- 
cantly either  with  supervi:iurs'  ratings  of  overall  performance  or  with 
production  records.    Navy  studies  (Cory,  1976a,  1976b,  and  Cory,  Neffson, 
Rimland,  §  Thomas,  1978)  have  generally  found  validity  coefficients  of 
paper-and-pencil  tests  in  the  personnel  classification  battery  with 
supervisor's  ratings  of  on-job  performance  for  unskilled  and  semi-skilled 
types  of  positions  which  ranged  from  .15  to  .20.    Maximum  shrunken 
multiple  correlations  of  the  set  of  classification  tests  for  these 
positions  with  on-job  performance  generally  have  been  found  to  range 
between  .20  and  .25. 

Ghiselli,  in  The  Validity  of  Occupational  Aptitude  Tests  (1966),  a 
comprehensive  survey,  reported  average  validity  coefficients  of  paper- and 
pencil  measures  of  the  ty-pes  used  in  the  A.SVAB  with  perfonnance  proficicn 
in  skilled  trades  jobs  which  ranged  from  .18  to  .26.    He  also  reported 
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correlations  with  these  criteria  of  .20  to  .25,  on  the  civeraac,  For 
measures  of  finger  and  hand  dexterity  and  of  .29  for  measures  of  person- 
ality. 

When  the  Ghiselli  data  were  reclassified  into  categories  of  Skilled, 
Semi-skilled  and  Unskilled  jobs,  additional  interesting  relationships 
were  found.    Thus  the  average  validity  coefficients  of  paper-and-pencil 
tests  generally  decreased  from  Skilled  to  Semi-skilled  to  Unskilled  jobs, 
except  for  measures  of  Perceptual  Speed,  where  the  direction  of  the 
relationship  was  reversed.    Personality  tests  also  had  higher  average 
validities  for  Semi-skilled  and  Unskilled  jobs  than  for  Skilled  jobs,  but 
the  coefficients  of  Finger  and  Hand  Dexterity  mev.sures  were  about  the 
same  for  the  three  types  of  jobs.    The  obvious  conclusion  from  Ghiselli 's 
findings  is  that  broadening  the  set  of  predictors  to  include  measures  of 
coordination  and  dexterity  and  of  personality  together  with  the  paper-and- 
pencil  measures  of  the  ASVAB  is  likely  to  improve  the  Navy's  ability  to 
select  and  classify  personnel  for  unskilled  and  apprenticeship  types  of 
jobs . 

For  this  reason,  Siegel  and  Wiesen  (1977)  developed  for  the  Navy  a 
bf'ttery  of  tests  to  be  used  to  assign  the  personnel  who  are  not  sent  to 
Navy  Technical  schools.    These  are  roughly  the  individuals  in  the  bottom 
25  percent  of  enlisted  personnel  in  terms  of  mental  ability.    These  indi- 
viduals are  usually  assigned  as  General  DetaiX  personnel  (GDPs)  fo  commands. 
There  they  work  in  manual  labor  and  semi-skilled  types  of  maintenance  and 
housekeeping  jobs  with  the  eventual  objective  of  training  on-the-job  for 
positions  in  the  trades  or  technical  areas.    GDPs  do  not  normally  receive 
formal  academic  training  for  jobs  following  completion  of  Recruit  Training 
and  a  two-week  general  apprenticeship  training  course. 

The  Siegel  and  Wiesen  tests  were  administered  in  an  assessment  center 
setting  which  was  denominated  a  ''Technical  Classification  Assessment 
Center'*  after  the  types  of  jobs  for  which  it  was  designed  to  select.  The 
purpose  of  this  paper  is  to  describe  the  predictive  validities  for  on-job 
performance  of  General  Detail  personnel  of  the  assessment  center  variables 
in  comparison  with  those  of  the  classification  test  scores  and  biographical 
variables  which  were  available  operationally. 

Data  Collection 

During  November  and  December  1975,  140  male  enlisted  graduates  from 
Recruit  Training  at  the  Naval  Training  Center  in  San  Diego  were  examined 
at  the  TCAC.    A  description  of  the  testing  results  as  well  as  the  develop- 
ment and  characteristics  of  the  measures  in  the  TCAC  is  given  in  Siegel 
and  Wiesen  (o£.cit.).    Two  separate  follow-ups  were  carried  out  to 
collect  on- job  performance  marks  on  these  personnel. 

Thus  in  November  1976  survey  questionnaires  were  sent  out  to  coinr.iands 
in  the  Fleet  to  which  personnel  from  tlio  Sieyel  ;ind  Wiesen  study  were 
currently  assigned.    These  special  questionnaires  collected  on-job  per- 
formance marks  for  the   men    from  their  current  supervisors.    At  the  same 
time  the  personnel  office  for  the  command  was  requested  to  forward  a 
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record  of  the  last  set  of  operational  performance  marks  that  the  man  had 
received.    Three  months  later,  in  Fchruary  1977,  the  request  was  repeated 
to  commands  which  had  not  responded  to  the  initial  questionnaire. 
Finally,  in  August  1977,  approximately  six  months  after  the  first  set  of 
supervisors*  marks  was  collected,  a  second  follow-up  was  carried  out  to 
collect  a  new  set  of  the  same  criterion  marLs, 


Assessment  Center  Variables 

Eight  job  performance  tests  and  a  semi-structured  interview  were 
used  in  the  TCAC,  from  which  29  scores  were  derived.     In  addition  four 
global  ratings  were  made  by  the  assessment  center  staff.    The  assessment 
center  tests  together  with  their  scores  are  briefly  described  in  the 
following  slides  and  commentary: 

Slide  #1-  1.    Conceptual  Intei^rat  ion/appl  i cat  i on--a  troubleshooting  problem 

in  which  a  simulated  hypothetical  system  was  described  together  with 

Slide  #2-  possible  malfunct-ions  and  their  causes.    Then  a  series  of  malfunctions 
was  presented  and  the  examinees  were  asked  to  identify  on  the  basis  of 
the  symptomatic  conditions,  the  causes.     Score  was  the  number  of  correct 
answers. 

2.  Inspect ion/sort--a  timed  test  consisting  of  90  items,  each  being 
one  of  six  types.    The  task  was  to  sort  the  items  by  type  and  to  reject 
those  which  had  imperfections  or  did  not  closely  match  the  type  definition. 
Scores  .computed  consisted  of  total  numbers  and  percentages  of  items 
correctly  sorted/rejected  and  incorrectly  sorted/rejected  and  an  unweighted 
composite  of  the  number  of  items  correctly  sorted  plus  the  number  of  items 
correctly  rejected. 

3.  Reliabi  1  ity--n  variation  on  one  of  the  llartslicrn  and  May  exercises 
(1930)  required  the  threading  of  IS  needles  and  self-reporting  of  the 
number  of  needles  successfully  threaded.     Since  the  eyes  of  five  of  the 
needles  were  blocked  by  clear  plastic  and  consequently  could  net  be 
threaded,  any  score  above  ten  was  considered  to  be  a  lie.    Score  was  a 
binary  variable    coded  "1"  for  a  truthful  response  and  '*0"  for  a  lie, 

4.  Tool  and  Object  Nomenclature,  use  and  rccognition--a  test  in  which 
unusual  tools  and  objects  from  Navy  life  were  presented  and  briefly  dis- 
cussed.   Subsequently  three  IS-item  true-false  tests  covering  the  material 
were  administered.     Scores  were  computed  measuring  examinee's  ability  to 
associate  an  object  with  its  (i)  use  and  (2)  name,  and  (3)  his  ability  to 
associate  its  name  with  its  use.    An  unweighted  sum  of  these  scores  served 
as  a  fourth  variable. 

Slide  ^3-  5.     Dual  Task--a  test  designed  to  measure  an  individual's  ability  to 

Slide  #4-  carry  out  attentional  time-sharing  while  doing  simultaneously  two  separate 
tasks.    The  test  required  monitoring  a  control  panel  while  fabricating  a 
pipe  assembly.    Cues  presented  on  the  control  panel  specified  changes  to 
be  made  in  settings  of  the  panel.    Scores  were  computed  for  the  number  of 
parts  assembled  correctly  and  the  number  of  panel  settings  performed 
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correctly  together  with  the  response  latencies  for  execution  of  the  panel 
settings. 

Slide  #5-  6.    Coordinative  Speed  and  Accuracy--a  timed  test  using  a  simple 

wiring  task.    After  a  brief  instruction  and  practice  session  was  given, 
the  examinee's  task  was  to  connect  wires  between  terminals  located  on 
separate  panels.    Directions  for  the  interconnections  were  shown  in  a 

Slide  #6-    wiring  diagram  and  a  color-coding  chart.    Scores  were  computed  for  the 
total  number  of  connections  which  were  correct. 

7.  Level  of  Aspiration — a  dart  throwing  task  having  three  sets  of 
trials.    Prior  to  each  trial  the  examinee  estimated  the  score  which  he 
would  obtain  for  the  trial.    Variables  scored  included  the  candidate's 
estimated  score  for-the  first  trial,  the  sum  of  the  estimated  scores  for 
the  first  and  the  second  trials  and  the  total  number  of  times  the 
estimated  score  was  lower  than  the  score  received  on  the  previous  trial 
(considered  to  be  a  measure  of  pessimism) .    In  addition  the  examining 
staff  recorded  binary  global  marks  for  each  candidate  indicating  the 
presence  or  absence  of  three  attributes:    realism,  pessimism  and  optimism. 

8.  Social  Interactive  Evaluation^-a  group  task  covering  a  simulated 
ammunition  storing  problem  in  which  members  of  the  group  transported  sand 
in  buckets  to  and  from  bins  over  a  course  which  had  bottlenecks  and 
difficult  transportation  points.    Three  timed  trials  were  interspersed  by 

•team  planning  sessions  which  were  designed  to  critique  and  improve  team 
coordination.    Scores  computed  were  algebraic  sums  of  the  positive  and 
negative  behaviors  expressed  by  the  individual  in  interacting  with  the 
group  during  each  of  the  following:     (a)  the  first  trial,  (b)  the  first 
planning  session  and  the  second  trial,  (c)  the  second  planning  session 
and  the  third  trial,  and  Cd)  all  trials  and  planning  sessions. 

9.  Interview--a  semi-structured  interview  conducted  by  a  2-person 
panel.    General  topics  and  extent  of  coverage,  but  not  the  individual 
questions,  were  specified  for  the  interview.    Ratings  were  made  on  16 
categories  covering  interest,  personality  characteristics,  and  motivation. 
Based  on  them  a  mean  evaluation  on  the  interview  was  computed  and  global 
scores  were  made  on  the  candidate's  ability  on  each  of  the  following 
dimensions:    C^)  learning,  (b)  psychophysical/motor ,  and  (c)  social/ 
motivational. 

In  addition,  the  following  summary  evaluatitjns  based  on  the  total 
findings  of  the  Assessment  Center  were  made:     (a)  global  evaluation  of 
ability  of  the  examinee  to  perform  on-the-job,  (b)  algebraic  sum  of  the 
positive  and  negative  comments  about  the  examinee  recorded  during  the 
testing,  and  (c)  number  of  discussions  and  votes  of  Assessment  Center 
personnel  required  to  arrive  at  agreement  concerning  evaluation  decisions. 


Operationally-derived  Variables 

A  description  of  the  operational  classification  tests  and  the  bio- 
graphical measures  which  were  used  in  the  study  is  shown  in  the  next 
slide. 
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Slide  #7-  The  six  operational  test  scores  used  were  from  the  personnel  class- 

ification battery  used  by  the  Navy  at  the  time.    These  tests  were  prede- 
cessors to  ASVAB  tests,  but  their  areas  of  measurement  and  characteris- 
tics were  very  similar  to  those  of  tests  in  the  ASVAB  battery. 

Although  the  four  biographical  measures  shown  (variables  7  to  10, 
inclusive)  were  collected  specifically  for  the  present  study,  the  first 
three  variables  are  present  in  Navy  operational  records  and  could  be 
used  for  selection  purposes,  if  desirable.    However,  because  of  stric- 
tures in  the  Privacy  Act,  the  last  variable  would  probably  not  be  avail- 
able to  the  Navy  at  the  present  time.    Fortunately,  however,  it  proved 
not  to  be  useful  as  a  predictor  anyway. 

Criteria 

Twelve  of  the  criteria  which  were  used  for  the  study  are  described 
Slide  #8-    on  the  next  slide.    Of  these  criteria.  Professional  Performance,  Military 
Behavior,  Milita'iy  Appearance,  and  Adaptability  describe  specific  aspects 
of  behavior,  or  traits.    Ratings  for  these  traits  were  collected  from  two 
sources:     (1)  the  operational  performance  ratings  and  (2)  the  ratings 
collected  on  the  special  questionnaire.    Two  global  performance  marks, 
AV-Special  and  AV-Op  consisted  of  unweighted  averages  of  the  four  trait 
marks.    AV-Special  was  computed  from  the  marks  given  on  the  Special 
Questionnaire.    In  contrast  AV-Op  was  computed  from  the  man's  official 
performance  marks.    Two  other  global  ratings,  OVER  and  REEN,  consisted  of 
single-element  marks  which  were  collected  from  the  special  questionnaires. 

Analysis 

After  the  questionnaire  returns  had  been  merged  with  the  records 
from  from  the  TCAC,  test-retest  reliability  coefficients  were  computed 
for  supervisors'  marks.    Then  zero-order  and  multiple-regression  validity 
coefficients  were  computed  for  the  four  global  criteria  which  were 
collected  on  the  first  follow-up.    A  step-wise  procedure,  the  accretion 
method  was  used  to  compute  multiple  regression  coefficients,  and  estimates 
of  the  shrunken  validities  were  computed  using  a  technique  recommended  by 
Thiel  (1971). 

For  ejch  criterion  the  predictor  set  used  for  multiple  regression* was 
restricted  to  those  variables  whose  zero-order  coefficients  were  statis- 
tically significant.    Additional  restrictions  imposed  at  each  step  were 
(1)  that  the  F  ratio  of  the  incremental  variation  in  the  criterion  pre- 
dicted by  the'^independent  variable  selected  for  the  step  with  the  unpre- 
dicted  variation  of  the  criterion  was  >4.5,  and  (2)  that  the  proportion 
of  the  variance  of  the  independent  variable  which  was  not  explainable  by 
the  variables  already  selected  was  >.30.    These  restrictions  were  imposed 
in  order  to  limit  the  variables  selected  to  those  which  made  real  as 
opposed  to  chance  contributions  to  the  predictiveness  of  the  battery. 

In  addition  a  hierarchical  selection  mode  was  employed  in  which 
variables    were  made  available  to  the  regression  program  a  set  at  a  time 
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for  the  three  sets  of  variables:     (1)  operational  tests,  (2)  biograph- 
ical variables,  and  (3)  assessment  center  variables.    The  first  two  of 
these  sets  were  composed  of  variables  which  were  or  which  could  be 
derived  operationally,  and  the  last  set  contained  the  variables  which 
were  being  used  experimentally  as  predictors.    Thus  the  hierarchical 
mode  permitted  the  computation  and  evaluation  of  the  incr3mental  validity 
added  by  biographical  variables  and  by  assessment  center  variables  to 
the  maximum  validities  available  from  the  tests  in  the  operational 
classification  battery. 

Results 


At  the  time  of  the  first  mailout,  14  persons  in  the  sample  had  been 
discharged  or  were  carried  as  deserters.    Questionnaires  were  returned 
for  106  personnel,  an  85  percent  return  rate  for  the  125  who  remained  in 
the  Navy  at  the  time.    For  the  second  follow-up  31  personnel  had  left  the 
service  or  were  deserters  and  71  questionnaires  were  returned,  a  66  per- 
cent return  rate.    The  return  rate  for  the  second  follow-up  undoubtedly 
was  lowered  because  the  usual  second  mailout  to  non-respondents  was 
omitted  in  order  to  expedite  the  study. 

Zero-Order  Validities  of  Operational  and  Assessment  Center  Variables 

Zero-order  validity  coefficients  of  the  operational  and  the  assess- 
ment center  variables  which  had  statistically  significant  coefficients 
le  #9-    for  any  of  the  four  global  performance  marks  are  shown  in  the  next  slide. 
Twenty-eight  of  the  136  coefficients  were  statistically  significant.  For 
the  operational  test,  biographical,  and  assessment  center  variables, 
respectively,  21,  19,  and  15  percent  of  the  predictors  were  statistically 
significant.'  Coefficients  of  the  statistically  significant  variables 
ranged  from  .19  to  .34  for  the  operational  variables  and  from  .21  to  .50 
for  the  assessment  center  variables.    In  general  the  lowest  values  were 
for  REEN  and  the  highest  values  were  for  AV-Op.    ARI  and  YRED  were  the 
major  operational  variables  which  were  significantly  predictive  of  super- 
visors' marks.    Of  the  ten  assessment  center  variables  which  had  statis- 
tically significant  validities  for  global  on-job  performance,  four  were 
significant  for  only  one  criterion  and  six  were  significant  for  two  or 
more  criteria.     Six  of  the  nine  assessment  center  tests  had  statisti- 
cally  significant  predictive  relationships  with  supervisors'  global  marks. 
These  tests  were  Coordinative  Speed  and  Accuracy,  Inspection/Sort,  Tool  and 
Object  Naming,  Dual  Task,  Level  of  Aspiration,  and  Mean  Interview  Rating. 

Maximally  Predictive  Sets  of  Operational  and  Experimental  Variables 

The  shrunken,  step-wise  multiple  correlation  coefficients  for  maxi- 
mally predictive  sets  of  the  operational,  biographical,  and  the  assess- 
Slide  #10-  ment  center  variables  are  shown  in  the  next  slide.    Each  row  in  the  table 
represents  the  addition  of  a  predictor.    The  total  number  of  predictors 
selected  for  a  criterion  at  any  one  point  is  shown  by  reading  down  the 
column  to  that  point. 
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These  data  indit-.atc  that  batteries  formed  from  MU  and  YREI)  of  the 
operational  variables  had  shrunken  validity  coef ficionts  raniUng  from 
.33  to  .42  for  the  four  global  marks.    Asisessment  center  variables  added 
from    05  to  .33  to  bring  about  maximum  shrunken  validity  coefficients  for 
these 'criteria  which  ranged  from  .38  to  .75.    Thus  operational  variables 
accounted  for  a  maximum  of  11  to  18  percent  of  the  variance  of  supervisors' 
marks     Addition  of  assessment  center  variables  to  this  battery  would 
increase  the  predictive  accuracies  so  that  from  14  to  5r>  percent  of  the 
variance  could  be  predicted.    The  predictive  accuracies  for  the  two 
supervisory  marks  which  were  composites,  AV-Special  and  AV-Op  were  par- 
ticularly high.     In  an  analysis  which  has  been  set  forth  elsewhere,  it 
was   concluded  that  the  higher  validities  of  the  composite  marks  resulted, 
at  least  in  part,  from  their  greater  reliability. 

Differences  in  the  types  of  variables  selected  for  the  maximally 
predictive  batteries  for  the  four  criteria  suggest  that  there  were  differ- 
ences in  the  characteristics  of  the  criteria  as  they  were  perceived  by 
supervisors.    The  single-element  criteria,  OVER  and  REEN,  seem  to  be 
largely  focussed  on  professional  competence.    The  variables  which  were 
maximally  predictive  for  these  criteria  were  cognitive  tests,  years  of 
education,  and  measures  of  accuracy  of  perception  and  execution  m  hands- 
on  situations.     In  contrast,  the  two  co.-nposite  marks  reflected  not  only 
these  characteristics,  but  also  characteristics  of  personality  and 
attitude.  M 

Computation  of  Attenuated  Values  for  the  Predictive  Validities  ^ 

Correction  of  the  validities  of  the  supervisors'  marks  for  unreli- 
ability in  the  criteria  was  also  carried  out  in  order  to  provide  more 
realistic  estimates  of  the  actual  predictive  validities  of  the  operational 
and  assessment  center  measures.    For  this  purpose  the  test-retest  reli- 
ability coefficients  of  the  eight  trait  and  the  four  global  marks  were 
lide  ni-  computed.    These  coefficients  are  shown  in  the  next  slide.    As  you  may 

recall,  the  coefficients  were  computed  by  correlating  each  of  the  perform- 
ance marks  received  for  the  first  follow-up  with  the  same  variable  from 
the  second  follow-up,  which  was  collected  approximately  six  months  later. 
The  statistics  in  the  table  are  shovm  for  a  Total  (T)  Sample  and  for  a 
Diverse  (D)  Sample,  that  subgroup  for  whom  the  supervisor  completing  the 
second  questionnaire  was  different  from  the  one  completing  the  first 
questionnaire.    Ninety- two  percent  of  the  personnel  for  whom  identifying 
information  for  supervisors  was  available  were  in  the  D  Sample.  There 
are  no  entries  in  the  D  Sample  for  operationally-derived  marks  because 
it  was  considered  to  be  not  desirable  to  collect  information  identifying 
the  supervisors  completing  the  operational  marks. 

In  general  the  test-retest  reliabilities  were  quite  low.  They 
ranced  from  .16  to  .58  for  the  trait  marks  and  fror.i  .29  to  .55  for  the 
global  marks.    The  reliabilities  of  the  two  global  s inj^le-element  marks 
were  from  .16  to  .26  lower  than  those  for  the  two  global  composite  marks.  . 
AV-Op  was  th^  mo5t  reliable  global  mark,  and  REEX  was  the  least  reliable 
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The  average  reliability  of  the  trait  marks  was  .07  higher  for  the 
T  Sample  than  for  the  D  Sanple.    Although  the  reliability  coefficients  of 
all  of  the  marks  with  counterpart  values  were  statistically  significant 
for  the  T  Sample,  for  the  D  Sample  two  trait  marks  and  one  global  mark 
were  not  significant.    However,  statistical  tests  indicate  that  differ- 
ences between  counterpart  values  in  the  T  and  the  D  Samples  were  not 
significant.    Also,  in  general,  the  relative  magnitudes  of  the  reliabil- 
ities of  the  trait  and  the  global  marks  were  the  same  for  the  T  and  the 
D  Samples.    Therefore  it  was  felt  that  the  comparisons  could  justifiably 
be  carried  out  on  the  sample  with  the  greater  number  of  degrees  of  freedom, 
the  Total  Sample. 

Attenuated  Zero-order  and  Multiple  Regression  Coefficients 

Zero-order  values  for  the  validity  coefficients  corrected  for  un- 
reliability of  the  supervisors'  marks  are  sho^  In  the  next  slide. 

Slide  #12- 

In  general,  increases  in  magnitude  of  the  zero-order  coefficients 
caused  by  the  attenuation  ranged  from  .09  to  .22.    Some  of  the  coeffi- 
cients in  the  table  are  very  high.    However,  the  asterisks  indicating  the 
statistical  significance  of  their  values  are  the  same  as  those  shown  for 
the  unadjusted  coefficients  previously  presented. 

•Estimates  of  the  attenuated  multiple  regression  coefficients  for  the 
four  global  marks  were  made  by  substituting  the  attenuated  zero-order 
validity  coefficients  into  the  predictor-criterion  intercorrelation  matrix 
and  recomputing  the  multiple  regression  statistics.    For  this  step  only 
variables  whose  unattcnuated  coefficients  had  been  significant  were  made 
available  to  the  regression  program. 

Slide  #13-  The  recomputed  statistics,  shown  in  the  next  slide,  indicate  that 

when  adjustments  were  made  for  the  unreliability  of  the  criteria,  the 
accuracy  achievable  from  the  battery  of  tests  and  biographical  variables 
was  very  high.    These  figures  indicate  that  operational  variables 
accounted  for  from  35  to  45  percent  of  the  variance  of  supervisors'  marks 
and  that  an  additional  32  to  59  percent  of  that  variance  would  be 
accounted  for  by  assessment  center  variables.    The  total  set  of  operational 
and  assessment  center  predictors  would  account  for  from  77  to  94  percent 
of  the  reliable  variance  of  supervisors'  marks. 

Although  I  have  used  conservative  procedures  for  making  these  esti- 
mates, the  predxctlveness  of  the  total  battery  of  tests  seems  surprisingly 
large.     However,  even  If  the  magnitude  of  the  findings  is  discounted  some- 
what the  data  still  show  these  variables  to  be  predicting  most  of  the 
reliable  variance  of  the  supervisors'  marks. 
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In  summary  the  major  findings  of  the  study  were: 

1.  A  composite  formed  from  operational  clnssificatLon  test  scores 
and  years  of  education  accounted  for  between  35  and  45  percent  of  the 
reliable  variance  of  supervisors'  ratings  of  ou-job  performance  of  General 
Detail  personnel.     Addition  of  assessment  center  variables  to  this  battery 
resulted  in  a  total  battery  which  accounted  for  77  to  94  percent  of  the 
reliable  variance  of  supervisors'  marks. 

2.  The  assessment  center  variables  which  were  the  most  useful  as 
predictors  measured  work  accuracy  under  time-sharing  conditions,  speed 
and  accuracy  of  finger-hand  dexterity  or  coordination,  classif xcation 
accuracy  in  a  hands-on  situation,  and  personality  and  attitudinal  char- 
acteristics. 

3.  Supervisors'  marks  formed  from  composites  of  two  or  more  scores 
were  more  reliable  than  those  which  were  based  on  only  a  single  rating 
element. 

These  findings  will  be  checked  on  a  new  sample  of  1,000  to  which  a 
revised  form  of  the  TCAC  has  been  administered.    In  the  event  the  findings 
hold  up,  it  is  hoped  that  eventually  the  TCAC  may  be  used  to  identify  the 
incoming  GDPs  who  have  potential  for  advancing  into  a  technical  rating 
so  that  these  personnel  can  be  channeled  into  appropriate  assignments 
before  they  are  lost  to  the  system. 
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VARIABLES  AVAILABLE  OPERATIONALLY 


CLASSIFICATION  TESTS 


GENERAL  CLASSIFICATION  TEST  (GCT) 

ARITHMETIC  TEST  (ARI) 

MECHANICAL  REASONING  TEST  (MECH) 

CLERICAL  APTITUDE  TEST  (CLER) 
ELECIRONIC  TECHNICIAN  SELECTION  TEST  (ETST) 

SHOP  PRACTICES  TEST  (SHOP) 

BIOGRAPHICAL  VARIABLES 


YEARS  OF  SCHOOLING  COMPLETED  (YRED) 
AGE  TO  NEAREST  BIRTHDAY 
DEMERITS  IN  RECRUIT  TRAINING 
ARREST  RECORD,  BINARY  CODE 
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CRITERIA 


VARIABLE  SOURCE^ 

TRAIT  SCORES 

PROFESSIONAL  PERFORMANCE  0 

PROFESSIONAL  PEBFORMANCE  S 

MILITARY  BEHAVIOR  0 

MILITARY  BEHAVIOR  S 

MILITARY  APPEARANCE  0 

MILITARY  APPEARANCE  S 

ADAPTABILITY  0 

ADAPTABILITY  S 

GLOBAL  MARKS 

AVERAGE  OF  THE  OPERATIONAL  TRAITS   (AV-Op)  0 

AVERAGE  OF  THE  SPECIAL  TRAITS  (AV-SPECIAL)  S 

OVERALL  PERFORMANCE  (OVER)  S 

RECOMMENDATION  FOR  REENLISTMENT  (REEN)  S 


^0  =  OPERATIONAL.  8  =  SPECIAL  QUESTIONNAIRE 
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ZERO-ORDER  VALIDITIES  FOR 
GLOBAL  CRITERIA 


CRITERION 


VARIABLE 


AV-Op 


AV-SPECIAL 


OVER 


REEN 


OPERATIONAL  VARIABLES 

ARI  26* 
ETST  24* 
YRED  32** 

ASSESSMENT  CENTER  SCORES 

PREDICTED  PERFORMANCE  IN  29* 
RECOMMENOEO  ASSIGNMENT 

NUMBER  OF  DISCUSSIONS  19 
AND  VOTES 

NUMBER  OF  COMMENTS  25* 

INSPECTION/SORT:  ACCURACY  -26* 
FOR  DEFECTIVE  ITEMS 

TOOL  AND  OBJECT  NAMING:  15 
SCORE  I 

DUAL  TASK:  NUMBER  OF  33** 
CORRECT  SETTINGS 

CDORDINATIVE  SPEED  &  18 
ACCURACY:  ACCURACY 

PERCENTAGE  ACCURACY  24* 

LEVEL  OF  ASPIRATION:  50** 
REALISM 

MEAN  INTERVIEW  RATING  16 


21* 
08 

34** 


08 
21* 

39** 
•05 

09 

26** 

16 

28** 
06 

32** 


30" 

10 

19* 


09 
16 

17 
-10 

11 

23* 

25** 

27** 
16 

23* 


28* 

10 

18 


08 

21* 

17 
-13 

21* 

12 

21* 

23* 
15 

26** 


'decimal  POINTS  OMITTED  FROM  COLUMN  ENTRIES 
•ps.05 


11^ 


iuDE  m 


SETS  OF  MAX!1ALLY  PREDICTIVE  VARIABLES 


PREDICTOR  SET 


AV-Op 


IHRUNKIN 
R 


1.  NAVY  CLASSIFICATION  23 
TESTS 


ARI 


AU-SPECIAL 


OVER 


REEN 


N 


SHRUNKEN 


JJ,     PRIWCTOR       »1       n.     PRIOICTOR  PWWCTOR  PRIOICTOR "  Jl 


SHRUNKEN 


N 


SHRUNKIi 


N 


71  IB 


ARI 


i04     29     ARI     105     26  ARI 


IDS 


2.  BIOGRAPNICAl 
VARIABLES 


42 


YflED 


71      41      YREO      104     37    YRED     105     33  YRED 


IDS 


3.  ASSESSMENT  CENTER 
VARIABLES 


62 


LEVEL  OF 

ASPIRATION: 

REALISM 


70  DUAL  TASK: 
NUMBER  OF 
CORRECT  SETTINGS 


71     54    NUMBER  OF  104 
COMMENTS 


71      5e  MEAN  INTERVIEW  104 
RATING 


44  DUAL  TASK:  105 
NUMBER  OF 
CORRECT 
SETTINGS 


3B  COORDINATION  IDS 

SPEED  & 

ACCURACY: 
.  ACCURACY' 


75. INSPECTION/SORT:  71 
ACCURACY  FOR 
DEFECTIVE  ITEMS 


'decimal  POINTS  OMriED  FROM  SHRUNKEN  Rj 


ERIC 


82  (> 


Slide  m 


SUPER 


^^^BS'  ^^^^^ 


VARIABLE 


^0  =  OPERATIONAL.  S  =  SPECIAL 
*j<  .05 


In  N 


PROFESSIONAL  PERFORMANCE 

0 

35** 

PROFESSIONAL  PERFORMANCE 

s 

48*** 

64 

43** 

50 

MILITARY  BEHAVIOR 

0 

16 

42 

MILITARY  BEHAinOR 

s 

33** 

64 

26 

50 

MILITARY  APPEARANCE 

(1 

58*** 

42 

MILITARY  APPEARANCE 

48*** 

64 

50 

ADAPTABILITY 

K 

45** 

42 

ADAPTABILITY 

37** 

64 

27 

50 

GLOBAL  MARKS 

AU-Op 

65*** 

42 

AU-SPECIAL 

48*** 

64 

43" 

50 

OVER 

32** 

64 

32* 

50 

REEN 

29* 

64 

24 

50 

•ONNAlfE 


**£<  .01 
'**£<  .001 
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attenuatep  zero-order 
validities  for  global  criteria 


VARIABLE 


ETST 


ASSESSMENT  Cfmtfr  SCORES 

pflEDICTEO  PERFORMANCE  'N 
flECOMMENOEO  ASSIGNMENT 

NUMBER  OF  DISCUSSIONS 

NUMBER  OF  COMMENTS 
INSPECTION/SORT-  ACCURACY 
fOB  DEFECTIVE  ITEMS 

TOOL  AND  OBJECT  NAMING: 
SCORE  I 

"fficlK  NUMBER  OF 
CORRECT  SETTINGS 

CSS!iS'«VJ'^^  SPEED  AND 
ACCURACY:  ACCURACY 

PERCENTAGE  ACCURACY 

LEVEL  OF  ASPIRATION- 
REALISM 

MEAN  INTERVIEW  RATING 


CRITERION^ 
AV-OPAV-SPECIAU  OVER 


REEN 


35* 

30* 

53** 

52* 

1  9 

1R 

1  o 

1R 

43** 

49** 

unr 

oo 

39* 

12 

16 

15 

26 

30* 

28 

39* 

34* 

56** 

30 

32 

-35* 

-07 

•18 

-24 

20 

13 

19 

39* 

44** 

38** 

41* 

22 

24 

23 

44** 

39* 

32* 

40** 

48** 

43* 

67** 

09 

28 

28 

22 

46** 

48** 

41* 

*P<.05 
'*P<.01 
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SETS  OF  MAXIMALLY  PREDICTIVE  ATTENUATED  VARIABLES^ 


a 


PREDICTOR  SET 


CLASSIFICATION 
TESTS 


AV-Op 


AV-SPECIAL 


SHRUNKEN  SHRUNKEN 

R     PREDICTOR    N  R  PREDICTOR 


33  Am 


28 


Aiil 


OVER 


SHRUNKEN 


N        R  PREDICTOR 


104  52 


ABI 


BEEN 

SHRUNKEN 
a      R  PREDICTOR 


105   SI  m 


H 

105 


2.  BIOGflAPHtCAL 
WfllABUS 


59  YRED 


60  im 


104  G7 


YREO 


105     65  VREO 


IDS 


a  ASSESSMENT  CENTEfl 
WBIABLES 


85  lEI/ELOF 
ASPIRATION: 
REALISM 


71 


97    OUAL  TASK;  71 
NUMBER  OF 
CORRECT 
SETTINGS 


NUMBER  OF 
COMMENTS 


104  81 


86    MEAN  INTERVIEW  104 
RATING 


89  COORDINATIVE  104 
SPEED  &  ACCURACY: 
ACCURACY 


OUAL  TASK: 
NUMBER  OF 
CORRECT 
SETTINGS 


105     76  COORDINATll/E 

SPEEO  4  ACCURACY: 
'  ACCURACY 


IDS 


COORDINATIVE  105  83  NUMSER  OF  DISCUSSIONS  105 
SPEEDS  ACCURACY:  S  VOTES 

PERCENTAGE  ACCURACY 


87  MEAN  INTERVIEW 
RATING 


]05 


94    OUAL  TASK:  104 
NUMBER  OF 
CORRECT  SETINfiS 


92  INSPECTION/SORT: 
ACCURACY  FOR 
DEFECTIVE  ITEMS 


IDS 


DECIMAL  POINTS  OMITTED  FROM  SHRUNKEN  fls 
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USING  AN  ASSESSMENT  CENTER  TO  PREDICT  LEADERSHIP  COURSE 
PERFORMANCE  OF  ARMY  OFFICERS  AND  NCOs 


Frederick  N.  Dyer 
Richard  E.  Hllllgoss 
Army  Research  Institute  Field  Unit 
Fort  Bennlng,  Georgia 

INTRODUCTION 

The  assessment  center  concept  Involves  the  Immersion  of  an  Individual 
in  situations  vhlch  simulate  those  he  would  face  If  he  were  selected  for 
entry  or  promotion  and  assessment  of  his  performance  In  this  simulation* 
It  has  been  widely  used  In  Industry  and  business  to  select  personnel  for 
high  level  positions JL     In  1973-1974  the  U.S.  Army  Infantry  School  (USAIS) 
AssesBment  Center  (ACTR)  assessed  students  from  the  Infantry  Officer 
Advanced  Courae  (lOAC),  the  Infantry  Officer  Basic  Course  (lOBC)  and  the 
Advanced  NCO  Educational  System  (ANCOES)  to  determine  the  feasibility  of 
the  assessment  center  as  a  technique  for  leadership  development  and 
leadership  prediction.    It  also  assessed  students  from  the  Branch 
Immaterial  Officer  Candidate  Course  (BIOCC)  to  determine  the  feasibility  of 
the  assessment  center  concept  as  a  selection  device.^     Dyer  and  Hllllgoss 
related  the  ACTR  scores  on  these  Officers  and  NCOs  to  ratings  of  field 
leadership  obtained  six  months  following  completion  of  leadership  training 
and  assignment  to  new  duty  stations*    These  ratings  were  made  by 
supervisors,  peers,  and  subordinates  of  the  former  assessee.    Prediction  of 
this  field  leadership  criterion  was  poor.    In  fact,  the  more  assessor  time 
that  went  Into  assessment  of  the  Individual  the  poorer  the  correlation  with 
this  field  leadership  rating  criterion  for  that  exercise.    This  was  true 
despite  high  reliabilities  of  both  the  ACTR  measures^  and  the  field 
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leadership  ratings*  The  latter  was  Indicated  by  high  correlations  between 
the  6-month  ratings  and  ratings  made  on  the  same  Individuals  at  18  months 
following  asslgxsaent  to  new  units*     Self-descrlptlon  Instruments  did  a  much 
better  job  than  ACTR  exercise  assessor  ratings  In  predicting  the  leadership 
ratings.    It  appeared  that  the  ratings  made  by  subordinates,  peers,  and 
superiors  were  strongly  Influenced  by  the  leader's  f elf-perceptlon  of  his 
leadership  skills. 

The  purpose  of  the  present  paper  Is  to  examine  the  utility  of  the  ACTR 
measures  for  prediction  of  another  criterion,  namely,  the  end-of-course 
grade  obtained  by  the  assessee  In  the  leadership  course  that  he  completed 
Immediately  after  going  through  the  assessment  center. 

METHOD 


ASSESSMENT  CENTER  PERSONNEL 

The  assessors  consisted  of  six  Majors,  seven  Captains,  two 
Lieutenants,  three  Master  Sergeants,  two  Sergeants  First  Class,  and  one 
Staff  Sergeant.    The  assessors  were  selected  by  DA  using  the  following 
criteria:    each  man  must  be  In  one  of  the  combat  arms;  each  Captain  and 
above  must  have  had  command  experience;  each  Major,  Captain,  and  Sergeant 
must  have  served  In  combat;  and  Officers  must  have  an  advanced  degree  In 
one  of  the  behavioral  sciences*    The  assessors  received  training  for  four 
months  on  principles  and  techniques  In  assessment.  Interviewing  and 
counseling  before  beginning  their  duties*    The  training  Included  repeated 
rehearsals  of  assessment  exercises* 

Table  1  presents  a  summary  of  assessee  characteristics  and  group 
sizes*    Assessees  reported  to  Fort  Bennlng  one  week  before  their  scheduled 
USAIS  course  to  participate  In  the  assessment  centers    They  were  randomly 
selected  by  DA  from  all  students  scheduled  for  USAIS  leadership  training* 

ASSESSMENT  CENTER  EXERCISES 

The  ACTR  staff,  with  assistance  from  Army  Reearch  Inntltate  and  HumRRO 
scientists,  constructed  exercises  and  questionnaires  to  measure  ten 
dimensions  of  leader  behavior*    Leadership  research  Indicated  these 
dimensions  to  be  appropriate  for  the  assigned  mission  and  It  was  believed 
these  dimensions  could  be  evaluated  using  the  assessment  center  concept* 
These  were  adaptability,  administrative  skills,  communication  skills, 
decision  making,  f orcefulness ,  mental  ability,  motivation,  effectiveness  In 
an  organizational  leadership  role,  social  skills,  and  supervisory  skills* 
In  evaluating  possible  exerciser  and  exercise  concepts,  a  basic  factor  of 
consideration  was  that  the  exercises  would  place  the  assessees  In  uniquely 
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Tab)^^  1 

ASSESSEE  GROUP  CHARA^vjERisucS  AND  Sl^^S 


Descriptor 

lOBC 

A-SSgSSMENT  QfiOM? 

Biocc(oc;s) 

Al^COES 

Number  Assessed 

90 

88 

87 

Number  completing 
leadership  courses 

87 

105 

79 

Pay  Grade 

0-1 

.0-3 

E  3^6 

E  6-7 

Average  Age 

22.6 

25.3 

33.3 

Average  years  of 
Active  Duty 

0.3 

3.3 

12.9 

o 
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different  situations  while  simultaneously  providing  multiple  opportunities 
for  the  evaluation  of  each  dimension.    Exercises  were  developed  which 
exhibited  situational  diversity,  military  relevance  and  apparent  potential 
for  eliciting  behaviors  related  to  the  designated  dimensions.-'  The 
following  exercises  were  developed: 

Entry  Interview;    A  background  interview  to  elicit  information  related 
to  motivation,  experience  and  the  assessee's  self-knowledge  of  his 
strengths  and  weaknesses  (Time:  65')* 

Appraisal  Interview;    An  applied  exercise  in  which  each  assessee 
interviewed  two  others  to  select  one  for  a  position  within  a  battalion. 
This  interview  elicited  behaviors  related  to  communication  skills,  social 
Interaction  and  organization  of  thought  (105'). 

Leaderless  Group  Discussion:    This  exercise  was  a  combined  individual 
and  group  task  in  which  6  lOAC  assessees  were  assigned  a  mission  to 
distribute  year-end  funds  among  the  represented  directorates  while 
attempting  to  acquire  a  maximum  amount  for  his  own  directorate.  lOBC, 
BIOCC,  and  ANCOES  assessees  were  assigned  a  mission  to  get  a  soldier  from 
their  unit  selected  as  the  Brigade  Soldier  of  the  Month  and  providing  a 
rank  order  of  merit  list  of  the  available  candidates.    This  exercise 
elicited  behaviors  associated  with  forcefulness,  persuasiveness, 
organizational  ability  and  group  interaction  (140'). 


Olmstead,  J.  A.,  Cleary,  F.  K. ,  Lackey,  L.  L. ,  and  Salter,  J.  A. 
Development  of  Leadership  Assessment  Simulations.    Human  Resources 
Research  Organiztion  TR  73-21,  September  1973. 
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In-Basket  Exercise  (Three  versions:     lOAC  -  assesses  Va5  placed  In  the 
role  of  a  battalion  commander;  lOBC/BIOCC  -  assessee  W38  placed  in  the  role 
of  a  company  commander;  ANCOES  -  assessee  was  placed  In  th^  role  of  a  let 
Sergeaint)*    An  In-basket  containing  many  Items  typical  of  the  appropriate 
position  vas  presented  to  the  assessee  who  had  3  hours  to  address  each  Item 
in  the  in^basket.    This  exercise  elicited  behaviors  relatlt^g  to  problem 
solving,  decision  making,  work  organization  and  leadership*    It  was 
followed  by  an  interview  to  discuss  reasons  for  action  tsfe^n  and  the 
relationship  perceived  to  exist  among  some  of  the  actions  (Exercise  180'; 
Interview  80"). 

War  Game     (lOAC  assessees  only):    This  was  an  assign^d^^trole  rotating 
leader  exercise  conducted  in  two  160  minute  sessions.    TeajOB  of  6  players 
engaged  in  cost  effectiveness  analysis  in  a  military  force  planning 
environment.    Total  costs,  R&D,  Intelligence  acquisition ,  f^anced 
off enslve/defenslve  forces  were  all  considered  under  limit^c^  budget  and 
time  constraints.    This  exercise  elicited  organizational  .leadership 
behavior  (Exercise  320";  Orientation  90'). 

Radio  Simulate  (Three  versions:     lOAC  assessees  were  placed  in  company 
commander  role;  lOBC/BIOCC  assessees  were  placed  in  a  platoon  leader  role 
during  a  civilian  emergency  situation  to  Insure  that  lack  c£  military 
experience  did  not  preclude  them  from  pai:^lcipatlon  in  the  exercises; 
ANCOES  assessees  were  placed  in  the  role  of  acting  platoon  leaders).  It 
was  a  5-hour  exercise  using  radios  as  the  only  means  of  coiJununlcatlon.  It 
elicited  organizational  and  leadership  behaviors  (Exercise  300'; 
Orientation  90"). 

Assigned  Leader  Group  Exercise  (Field  Exercise)     (lOBC,  BIOCC,  ANCOES): 
This  was  a  5-hour  rotating  leader  designated  exeioise  Invoj-Vlng  a  team  of  6 
assessees.    There  were  6  lanes  with  a  different  obstacle  provided  for  each 
lane.    It  elicited  emergent  leadership »  planning  and  organizational 
behaviors  (300"). 

Management  Exercise    ("Conglomerate"):    This  was  a  two  iiour  exercise 
divided  into  two  planning  and  two  trading  periods*    Th^  IS^man  assessment 
group    was  organized  into  three  6-man  groups  who  competed  against  each 
other.    This  exercise  elicited  behaviors  relating  to  eitierp^at  leadership, 
aggressiveness  and  social  interaction  (120'). 

Writing  Exercise:    This  was  an  exercise  designed  to  measure  accuracy  of 
information  provided,  grammar,  spelling  and  completeness*    The  lOAC 
assessees  responded  to  a  Staff  Action  Paper  and  the  other  assessment  groups 
to  a  discharge  action  (60' )• 
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PSYCHOMETRIC  TESTS  AND  SELF-DESCRIPTION  INSTRUMENTS 


A  survey  of  tests  in  general  was  made  revealing  many  possibilities  for 
adoption  Into  the  assessment  program.    The  primary  criterion  for  selecting 
specific  tests  was  relevance  of  the  variables  to  be  tested  to  the 
leadership  dimensions  of    administrative  skills,  communication  skills, 
supervisory  skills,  f orcefulness,  adaptability,  decision  making,  and  mental 
ability. 

Additional  criteria  used  in  selecting  tests  were:    non-offensive  test 
items,  suitability  in  content  and  format  for  use  with  mature  adults, 
adequacy  of  normative  data  and  theoretical  discussions,  recency  of 
publication  or  revision  and  efficiency  in  test  administration. 

Both  cognitive  and  non-cognitive  tests  were  selected  specifically  to 

(1)  allow  for  the  comparison  of  an  individual  score  with  normative  data  and 

(2)  verify  the  results  of  other  assessment  measurements*    Group  tests  %rere 
selected  in  order  to  minimize  the  number  of  assessors  and  the  amount  of 
time  required  for  each  assessment.    The  psychometric  tests  and  self- 
descriptive  Instruments  selected  are  listed  below.    The  Person  Description 
Blank  ras  developed  for  this  project.    All  others  are  described  in  the 
Mental  Measurement  Yearbook. 

1.  Leadership  Opinion  Questionnaire 

2.  Watson-Glaser  Critical  Thinking  Appraisal 

3.  Nelson-Denny  Reading  Test 

4.  Henmon-Nelson  Test  of  Mental  Ability 

5.  Leadership  Q-Sort  Test 

6.  Social  Insight  Test  (Chapln) 

7.  Work  Environment  Preference  Schedule  (Gordon) 

8.  Strong  Vocational  Interest  Blank 

9*     Edwards  Personal  Preference  Schedule 
10*  Person  Description  Blank 


6 

Buros,  0.  K. ,  The  Seventh  Mental  Measurements  Yearbook*  Gryphon  Press, 
Highland  Park,  N.J.,  1972. 
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Questionnaires  to  obtain  specific  background  Information  about  the 
assessee,  and  to  solicit  the  assessee^s  opinion  of  his  assessment 
experience,  vere  also  developed •    The  purpose  of  these  questionnaires  va<i 
to  assist  In  the  overall  research  effort  and  to  collect  suggestions  for 
improving  Assessment  Center  techniques  and  administration. 

CONDUCT  OF  THE  ASSESSMENT  CENTER 


Assessment  activities  occupied  three- and-one-half  days  of  the 
assessee's  time.    Days  typically  began  at  0700  with  activities  continuing 
to  2100.    This  allowed  collection  of  a  great  deal  of  Information  In  the 
short  time  available,  enhanced  the  "total  Immersion"  experience,  and 
reduced  the  effects  of  outside  Influences  on  ACTR  performance*    Paper  and 
pencil  tests,  simulated  leadership  tasks  and  Interviews  were  approximately 
equally  distributed  over  the  three-and-one-half-day  period.    Certain  groups 
of  assessees  returned  for  feedback  counseling  from  one  to  three  weeks 
following  their  assessment*    During  this  three-hour  period  their  leadership 
strengths  and  weaknesses,  as  ideatified  in  the  assessment  center,  were 
communicated  and  activities  were  suggested  which  would  lead  to  correction 
of  deficiencies* 

LEADERSHIP  COURSE  PERFORMANCE 

The  assessees  were  all  students  in  USAIS  Leadership  Courses  and 
attended  thes*^.  courses  immediately  after  their  assessment*    The  courses 
ranged  in  length  from  12  weeks  for  the  Infantry  Officer  Basic  Course  (lOBC) 
and  the  Advanced  NCO  Educational  System  (ANCOES)  through  14  weeks  for  the 
Branch  Immaterial  Officer  Candidate  Course  (BIOCC)  to  36  weeks  for  the 
Advanced  Infantry  Officer  Course  (lOAC)*    Table  2  lists  the  number  of  hours 
which  were  devoted  to  different  subjects  in  each  of  these  courses. 

Tables  3  and  4  Illustrate  the  number  of  examination  points  associated 
with  different  activities.    The  total  possible  score  was  1000  for  each  of 
the  courses.    Actual  means  and  standard  deviations  for  the  total  scores 
obtained  by  the  assessees  are  given  in  Table  5.    No  data  are  available  for 
the  variances  of  subtests  of  the  total  score  and  it  is  thus  impossible  to 
accurately  estimate  how  much  each  subtoplc  added  to  the  totAl  score* 
However,  the  points  of  the  subtest  probably  reflect  to  some  measure  its 
contribution* 

For  the  most  part  the  instruction  was  conducted  by  the  lecture  method 
and  testing  was  traditional  paper  and  pencil  multiple  choice*  The 
exceptions  are  the  military  stakes  and  FT  testing  of  the  lOBC  curriculum. 
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Table  2 


Academic  Hours  for  Four  ACTR  Groups 


Title 

lOBC 

lOAC 

BIOCC 

ANCOES 

Combined  arms  subjects 

282.5 

510.0 

100.0 

102.0 

Staff  subjects 

o7  n 

193*  0 

44.0 

119.0 

General  subjects 

83.5 

117.5 

188.0 

XUo.  J 

Communications/Electronics 

10.0 

23.0 

11.0 

15.0 

Unit /Materiel  readiness 

42.5 

44.0 

23.0 

16.0 

Weapons 

73.0 

44.0 

50.0 

18.0 

Student  Evaluation 
&  Counselling 

36.0 

100.0 

105.0 

20.0 

Elect ives 

45.0 

42.0 

Guest  Speaker  program 

18.0 

554.5 

1094.5 

521.0 

438.5 
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Table  3 

Composition  of  Total  Score  for  lOBC  and  IOaC  Groups 


lOBC 

IOAC 

Subject 

Points 

Subject 

Points 

Map  reading 

10 

Medical  services  support  quiz 

10 

Pro  facts 

50 

Indoor  land  navigation 

25 

Land  navigation 

(field) 

120 

Leadership  management 

45 

Leadership 

100 

Staff  f unctior.s 

125 

Mil  stakes  Part 

I* 

140 

Nuclear,  Chemical,  Biological 

II* 

170 

operations  tNCB) 

35 

Mil  stakes  Part 

55 

10 

Maintenance  management 

Patrolling 

10 

100 

Engineer 

Patrolling  evaluation 

Army  Physical  Fitness  Test  (APFT)  100 
Communica  t  ion  /  main  t  enanc  e  100 
Written  Performance  100 


1000 


Communications 
Fact  sheet 
Disposition  Form 
Cmt  2  to  Disposition  Form 
Arty 

Graphics  quiz 
Operations 

Company  tactical  oper,  field 
Company  tactical  oper,  field 
Company  tactics 
Bn  defense 
Bn  offense 

Internal  defense  dev 
Aerial  employment 
Memorandum 
Staff  study 

Response  to  nonconcurrence 

Indorsement  military  Itr 

Final  Corap  Part  I 

Bde  defense 

Bde  offense 

Final  Comp  Part  II 


S 
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*"Hands-'On"  performance  test  of  various  equipment. 
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10 
10 
10 
25 
10 
30 
80 
75 
25 
50 
50 
30 
35 
10 
40 
10 
10 
50 
30 
30 
50 


1000 


Table  4 

Composition  of  Total  Score  for  BIOCC  and  ANCOES  Groups 


i 


BIOCC 

Subject 

Squad  drill  performance 

rxducywn  ariii 

Oral  presentation 

Land  navigation  field  exam 

Phase  I  Comp 

Land  navigation  field 

Maintenance  management 

Phase  II  Comp 

Army  Physical  Fitness  Test 

Phase  III  Comp 


ANCOES 


Points 

Subject 

Points 

60 

Land  navigation  outdoor 

40 

60 

Land  navigation  indoor 

40 

50 

Communicat  ions 

40 

15 

Graphics 

10 

120 

Leadership  Group,  Medical 

55 

120 

Weapons 

95 

100 

Maintenance 

70 

175 

Combat  Suj^port 

85 

100 

Mechanized  Training 

70 

200 

Forward  observer 

80 

1000 

Fire  direction  control  (FDC)  I 
Writing  Req  Mil  Itr 

90 

A 

FDC  II 

80 

FDC  III 

85 

Spot  Quiz 

10 

Fundamentals  of  TaC  ^ 

35 

Cmt  2  to  Disposition  Form 

15 

Staff 

85 

1000 
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Table  5 

Mean  and  SD  for  Total  Scores 


Group 


jj  Mean  Standard  Deviation. 


lOBC  87  857.84  Al.56 

lOAC  84  839.74  A7=10 

BIOCC  105  876.53  46.52 

ANCOES  79  810.38  54.41 
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CORRELATIONS  wA:^-[^^  CUt"^^^^^^ 
OF  ASSESSOR  RATINGS  FOR  A  )^^^j.^^^^s^  ^^^^^  ^^tsCUSSlON 


LGD  Dimensions 


Initial  Presentation 

Formal  oral 
communication 

Oral  organization 
PresenLStion  impact 

Group  Discussion 

Participation 

Group  leadership/ 
facilitation 

Per  suas  ivenes  s 

Convey  information/ 
communication 

Social  Concern 


lOAC 


fit 


^lOCf 


ANCOES 


.05,  .01 


\ 

o 
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RBSULTS 

'fhe  scores  obtai^ied  from  the  ACTR  fall  into  the  following  alx  classes: 

^*    Assessor  ratings  of  assessee  performance  during  individual 
group  formal  exercises  such       the  In-Basket, 

2»    Peer  rankii^g^  of  assessees  in  those  formal  exercises  Where  a  group 
of  assesaeea  participated  together  such  as  the  Assigned  Leader  Group 
Exercise, 

3'    Self^rankings  by  the  assessee  of  his  performance  relative  to  other 
group  ttembeta  in  these  group  exercises. 

Leadership  dlmensiot^  ratings  made  by  an  assessor  during  the  Entry 
Interview  with  the  assessee, 

^*    Assessee  performance  on  paper  and  pencil  performance  tests,  ^nd 

6«    Assessee  sel^-descriptions  on  questionnaires  and  other  instruments 
such  as  the  fidwards  Personal  preference  Schedule. 

results  will  be  discussed         each  of  the  above  classes  of  score 
and>  following  this,  ^^e  classes  of  ACTR  scores  themselves  will  be 
discussed  and  compared  on  their  effectiveness  for  prediction  of  the  field 
leadership  ratings  criterion.    Proportions  of  successful  predictors  will  be 
compared  among  classes  as  wli^  ^he  amount  of  time  required  by  assessors  and 
assessees  to  obtain  each  successful  measure,    xhe  end  result  will  be  an 
ordering  of  the  diff eJ^ent  classes  of  ACTR  measure  on  their  utility  for 
predicting  the  criterion. 

!•    ASSESSOR  RATINGS  0^  ASSESsEE  PERFORMANCE  DURING  FORMAL  EXERCISES 

teaderlegg  Qroup  Sl^SHgsiopjLGDX 

The  Leaderless  Group  Discussion  rating  form  was  in  two  parts:  Initial 
presentation  Rating  th^ee  items,  and  Discussion  Participation  Rating 

(ppR)»  six  items.    Correlations  with  the  criterion  for  assessor  ratings 
durine  this  exercise         P^^esented  i^^  Table  6.    "Formal  oral  conuaonication" 
was  the  only  signific^'^t  dimension  common  to  all  four  assejssment  groups. 
Moral  organization,"  ^°d  "conveys  information"  each  had  significant 
correlations  yith  the  criterion  for  three  out  of  the  four  assessment 
groups.    Only  "negative  social  impressions,"  failed  to  have  a  significant 
correlation  with  the  criterion  for  any  of  the  assessment  groups. 

Late  Exerd^i£jCQNG2 
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The  Congolomerate  Game  Rating  Form  consisted  of  eight  Items.    Each  of 
the  assessor  ratings  had  a  significant  correlation  with  at  least  one  of  the 
assessnient  groups*    These  correlations  are  presented  In  Table  7« 
"Leadership  emergence,"  "energy  and  vigor,"  and  "decision  quality"  each  had 
significant  correlations  with  the  criterion  for  three  of  the  four 
assessment  groups*    "Receptivity,"  and  "group  facilitation,"  predicted  the 
crlte^^on  only  for  the  ANCOES  assessees  and  "sensitivity,"  only  for  the 
BIOCC  assessees* 


Rfldlo  Simulate 

The  Radio  Simulate  Leadership  Dimension  Rating  Form  Is  divided  Into 
tyo  parts:    Platoon  (P)  and  Battalion  (B),  each  having  the  same  eight 
Items*    The  Platoon  rater  was  an  Assessor  who  acted  as  subordinate  to  the 
assessee  In  the  exercise.    The  Battalion  rater  acted  as  his  supervisor* 
Tabl^  8  presents  the  correlations  with  the  criterion  for  assessor  ratings 
on  these  Items*    "Decision  making,"  predicted  the  criterion  for  all 
assessee  groups "in  the  Platoon  ratings*    "Communication  skills,"  predicted 
the  criterion  for  all  assessee  groups  in  the  Battalion  ratings* 
"Comfii^icatlon  skills,"  and  "motivation"  predicted  the  criterion  for  three 
of  the  four  assessee  groups  in  the  Platoon  ratings*  "Adaptability," 
"motivation,"  "f orcefulness ,"  and  "administrative  skills"  predicted  the 
criterion  for  three  of  the  four  assessee  groups  in  the  Battalion  ratings* 


In-Basket 

Thirteen  of  the  fourteen  dimensions  showed  significant  correlations 
with  the  criterion  for  at  least  one  of  the  assessment  groups*  These 
corralatlons  of  assessor  ratings  with  the  criterion  are  presented  in  Table 
9.     "Supervision  of  subordinates,"  "attention  to  detail,"  and  "task 
orletitatloti,"  were  significantly  correlated  with  the  criterion  for  all 
assessee  groups*    "Decision  making,"  "use  of  available  information,"  and 
"working  with  superiors,"  were  significantly  correlated  with  the  criterion 
for  three  of  the  four  assessee  groups*    "Written  communication,"  was 
significantly  correlated  with  the  criterion  for  lOBC  assessees  only,  and 
"self-confidence,"  for  ANCOES  only*    "Sensitivity,"  did  not  correlate 
significantly  with  the  criterion  for  any  assessee  group* 

^pralsal  Interview 

The  Appraisal  Interview  consisted  of  eight  dimensions;  five  of  which 
predicted  the  criterion*    These  correlations  are  given  in  Table  10*  "Self- 
confluence,"  "use  of  information,"  and  "accomodation,"  did  not  predict  the 
criterion  for  any  of  the  assessee  groups* 

Writing  Exercise  "  , 
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TABLU  7 

CORRELATIONS  WITH  THE  CRITERION  OF  ASSESSOR 
RATINGS  FOR  THE  CONGLOMERATE  EXERCISE 


CONGLOMERATE 
DIMENSIONS 

lOAC 

Assessment  Group 
lOBC  BIOCC 

ANCOES 

Energy  &  Vigor 

.25 

* 

.19 

.09 

* 

.25 

Leadership  Emergence 

* 

.22 

.12 

* 

.22 

** 

.28 

Oral  Communication 

* 

.21 

.05 

.15 

** 

.29 

Decision  Quality 

** 

.27 

-.04 

* 

.18 

** 

.43 

Sensitivity 

.15 

.13 

* 

.22 

.18 

Eleceptivity 

.09 

-.04 

.06 

.26 

Group  Facilitation 

.17 

.16 

.15 

** 

.27 

Overall  Effectiveness 

** 

.27 

.19 

.12 

* 

.23 

*  ** 
.05,  .01 
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table:  8 

correlations  with  the  criterion  of  assessor 
ratings  for  the  radio  si^^ulate 


RADIO  SIMULATE 
DIMENSIONS 

lOAC 

lOBC 

BIOCC 

ANCOES 

Social  Skills  P 

-.04 

.12 

** 

.37 

* 

.22 

Social  Skills  B 

-.07 

* 

.23 

** 

.26 

.17 

Cotranunication  Skills  P 

-.01 

** 

.33 

.31 

** 

.35 

Communication  Skills  B 

* 

.21 

** 

.29 

** 

.39 

** 

.30 

Adaptability  P 

.14 

.16 

** 

.24 

** 

.42 

Adaptability  B 

-.00 

.28 

* 

.18 

** 

.29 

Motivation  r 

m  X.D 

* 

1  8 

.24 

* 

.22 

Motivation  B 

.04 

.20 

* 

.21 

.28 

Forcefulness  P 

.17 

.29 

.11 

Forcefulness  B 

.11 

.21 

.33 

* 

.20 

Decision  Making  P 

* 

39 

** 

.35 

Decision  Making  B 

-.02 

.09 

* 

.19 

A* 

.33 

Administrative 
Skills  P 

.04 

.08 

* 

.23 

ft* 

.40 

Administrative 
Skills  B 

.11 

.34 

* 

.18 

** 

.32 

Effectiveness  in 

Org .  Leadership 
Role  P 

.10 

-.00 

** 

.35 

** 

.28 

Effectiveness  in 

Org. Leadership 
Role  B 

.13 

.11 

** 

.25 

** 

.29 

i 


i 


.05,     .01  •;  (9  On 

794 


ERIC 


TABL^  9" 

CORRELATIONS  WITH  -j«E  Cl^l'TERION  Op 
ASSESSOR  RATINGS  FOR         IN-^bASKET  EXeI^CIsE 


IN~BASKET 
DIMENSIONS 


lOAC 


Written  Communication 

Planning  &  Organization 

Supervision  of 
Subordinates 

Task  Orientation 

Decisiveness 

Working  with  Superiors 

Personal  actions  and 
Initiative 

Decision  making 

Attention  to  Detail 

Problem  Analysis 

Directing  Ability 

Use  of  Available 
Information 

Self-confidence 


.05,  .01 


7^5        64  i 


TABLE  10 

CORRELATIONS  WITH  THE  CRITERION  OF 
ASSESSOR  RATINGS  FOR  THE  APPRAISAL  INTERVIEW 


.APPRAISAL  INTERVIEW 
DIMENSIONS 


Topic  Selection 
Written  Communication 
Written  Organization 
Planning 

Oral  Communication 


lOAC 


Assessment  Group 
lOBC  BIOCC 


ANCOES 


*  ic-k 

.05,  .01 
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The  Writing  Exercise  consisted  of  four  dimensions.     Three  of  the 
dimensions  were  correlated  significantly  with  the  crlt^rlot^  for  at  least 
one  of  the  assessment  groups • 

For  the  lOBC  assessees  the  criterion  scores  were  t-elat^^l  to 
"accuracy,"  (r-.24,  p  c  .05)  and  "completeness"  (r-.31,  j)<^01). 

For  the  lOAC  assessees,  only  "grammar"  was  significant 
(r-.21,  .05). 

Three  of  the  four  dimensions  predicted  the  crltertoa  for  the  BIOCC 
assessees:    "accuracy,"  (r-.23,  p  <C.05)  "grammar,"  and  ''completeness," 
(r".30,   .31,  p<.01,  respectively). 

Only  one  dimension  predicted  the  criterion  for  the  AKC^ES  assessees: 
"completeness,"  (r-.28,  p<.01). 

"Completeness,"  was  the  only  Writing  Exercise  dimension  predicting  the 
criterion  for  at  least  three  of  the  assessment  groups*  "Spelling,"  did  not 
predict  the  criterion  for  any  of  the  assessment  groups. 

Assigned  Leader  Group  Exercise  (ALGE) 

The  Assigned  Leader  Group  Exercise  rating  form  waa  In  three  parts: 
Leader  Behaviors  {four  dimensions).  Behavior  ^pllcabl^  to  both  Leader  and 
Follower  Roles  (three  dimensions),  and  Follower  Behavior^  (two  dimensions)* 

Three  of  the  Leader  Behavior  dimensions  correlated  significantly  with 
the  criterion  for  at  least  one  of  the  three  assessee  groups*     (The  lOAC 
assessees  did  not  participate  In  this  exercise*)    Only  one  of  the  Behaviors 
i^pllcable  to  both  Leader  and  Follower  dimensions  correJ.ated  significantly 
with  the  criterion  for  one  of  the  assessee  groups*    Eaeti  o£  the  two 
dimensions  of  the  Follower  Behavior  Items  correlated  slgal£lcantly  with  the 
criterion  for  at  least  one  of  the  assessee  groups* 

For  the  lOBC  Lieutenants ,  the  end-of-course  criterion  was  positively 
related  to  good  assessor  ratings  on  Leader  Behavior  dlmeiislofiSi,  "planning," 
and  "leadership,"  (r-*27,   *21  p<.05,  respectively).    Of  tb^  Follower 
Behavior  dimensions  both,  "leader  emergence,"  and  "group  facilitation," 
correlated  significantly  with  the  criterion  (r«*24,  .24,  -OS, 
respectively) * 

For  the  BIOCC  assessees  "leadership"  and  "decisiveness"  from  the 
Leader  Behavior  Group  were  significantly  related  to  the  criterion 
(r".20,   .18,  p<.05).    "Physical  ability  (r=.18,  p  <.05)  va»  significant 
from  the  Behaviors  Applicable  to  both  Leader  and  Follower.  ''Group 
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facilitation,"  was  the  only  significant  correlation  with  the  criterion  of 
the  "Follower  Behavior"  dimensions,  (r=.23,  p<.05). 

For  the  ANCOES  Sergeants,  the  criterion  was  positively  predicted  by 
two  dimensions  of  Leader  Behavior:  "planning,"  and  "decisiveness," 
(r-.21,  .20,  p<;  .05). 

Of  the  Leader  Behavior  Group,  "planning,"  "leadership}"  and 
"decisiveness,"  were  significant  for  at  least  two  of  the  three  assessment 
groups.    "Flexibility,"  was  not  significant.    Of  the  Behaviors  Applicable 
to  both  Leadership  and  Follower  Roles,  both  "motivation,"  and  "stress 
tolerance,"  were  not  significant.    "Physical  ability,"  was  significant  but 
for  the  BIOCC  assessees  only.    Of  the  Follower  Behavior  group,  "leader 
emergence,"  was  significant  only  for  the  lOBC  assessees.  "Group 
facilitation/'  was  significant  for  two  of  the  three  assessee  groups. 

Leader  Game 

Only  the  lOAC  Captains  participated  in  this  exercise  (it  took  the 
place  of  the  ALGE  for  this  group).    The  Leader  Game  was  quite  successful  in 
predicting  the  criterion.    In  fact.  Assessor  Ratings  for  this  exercise 
provided  the  highest  percentage  of  successful  predictors  (78%).    Of  the 
nine  dimensions,  seven  were  successful  in  predicting  the  criterion: 
"organization,"  "supervisory  skills,"  "participation,"  "problem 
comprehension,"  "leader  emergence,"  and  "overall  effectiveness," 
(r-.31,  .29,   .47,   .33,   .39,   .36,  p<;.01).    At  the  .05  level,  "planning," 
was  significant  (r=".25).    "Organization,"  and  "flexibility,"  were  not 
significant. 

2.       PEER-RANKINGS  ON  GROUP  EXERCISES 

Leaderless  Group  Discussion 

The  six  group  members  who  participated  in  this  execise  ranked  all  six 
members  on  six  dimensions  at  the  end  of  the  exercise.    Each  of  the  six 
dimensions  was  significantly  correlated  with  the  criterion  for  at  least  one 
of  the  four  assessee  groups.    These  correlations  are  presented  in  Table  11. 
"Overall  effect,"  was  the  only  variable  which  correlated  significantly  with 
the  criteriori  across  all  four  assessee  groups.    "Oral  communication" 
"leadership,"  and  "sociability,"  correlated  significantly  for  at  least 
three  of  the  four  assessee  groups.    "Persuasiveness,"  correlated 
significantly  with  the  criterion  for  the  ANCOES  assessee  group  only. 

Conglomerate  Exercise 

Peer  ranking  correlations  with  the  criterion  for  this  exercise  are 

*  «>> 
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TABLE  11 
CORRELATIONS  WITH  THE  CRITERION 
OF  PEER  RANKINGS  FOR  THE  LEADERLESS  GROLT  DISCUSSION 


Assessment  Group 


DIME!ISION 

lOAC 

lOBC 

BIOCC 

ANCOES 

* 

* 

** 

.18 

Oral  Communication 

.24 

.21 

.35 

* 

** 

.12 

^;oclabllity 

.20 

.24 

.25 

* 

* 

** 

Leadership 

.23 

.03 

.18 

.31 

* 

.06 

** 

Idea  Quality 

.23 

.23 

.11 

** 

Persuasiveness 

.18 

.15 

.08 

.29 

** 

* 

** 

** 

Overall  Effect 

.32 

.18 

.25 

.31 

*  ** 
.05,  .01 
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presented  In  Table  12.     No  dimension  was  significantly  correlated  across 
all  assessee  groups.    "Popularity,"  and  "acceptance,"  were  significant 
across  three  of  the  four  assessee  groups.    "Conflict,"    was  not  significant 
for  any  of  the  assessee  groups- 

Assigned  Leader  Group  Exercise 

This  exercise  was  the  least  predictive  of  the  four  exercises  which 
Included  peer  rankings.    However,  each  of  the  four  dimensions  did  predict 
the  criterion  for  at  least  one  of  the  assessee  groups.    None  of  the  four 
dimensions  predicted  the  criterion  for  the  ANCOES.    These  correlations  are 
presented  in  Table  13. 

Leader  Came  (lOAC  Only) 


Peer  rankings  for  the  Leader  Game  were  the  most  predictive  peer 
rankings.     In  fact  all  of  the  five  dimensions  were  significant  predictors. 
These  are  included  in  Table  13.    These  high  correlations  indicate  that 
assessees  ranked  highly  by  peers  on  the  exercise  tended  to  receive  the  high 
end- o  f- cour  se  scores. 

3.       SELF-4LANKINGS  ON  GROUP  EXERCISES 

Leaderless  Group  Discussion 

The  assessee  included  himself  in  the  group  rankings  for  this  exercise 
and  his  self-ranking  was  tested  also  as  a  predictor  of  the  criterion.  Only 
four  of  the  six  dimensions  were  found  to  predict  the  criterion  for  the  lOAC 
assessees.    These  were,  "oral  communication,"  "leadership,"  "idea  quality," 
and  "overall  effect,"  (r=.22,   .24,   .23,   .21,  p<;  .05,  respectively). 

None  of  the  dimensions  predicted  the  criterion  for  the  other  three 
assessee  groups,    "Persuasiveness,"  and  "soclabil fry,"  did  not  predict  the 
criterion  for  any  of  the  assessee  groups. 

Conglomerate 

Self-rankings  for  four  of  the  five  dimensions  were  slgtilf Icantly 
associated  with  the  criterion  on  this  exercise  for  the  lOAC  assessee  group. 
Ttiese  were  "popularity,"  "planning,"  "energy,"  and  "acceptance," 
(r-.28,  .43,  p<.OI,  r=.2I,  p<.05,  respectively). 

None  of  the  dimensions  predicted  the  criterion  for  the  other  three 
assessee  groups.    "Conflict,"  did  not  predict  the  criterion  for  ciny  of  the 
assessee  groups. 

S.I--: 
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TABLE  12 

CORRELATIONS  WITH  THE  CRITERION  OF 
PEER  RANKINGS   r'OR  THE  CONGLOMERATE  EXERCISE 


DIMENSION 


Popularity 
Energy 

Acceptance 

Planning 


lOAC 


Assessment  Group 

lOBC  BIOCC 


.12 

i 

.34 


** 


.35 


** 


.28 


** 


.20 

i 

.27 
.20^ 
,13 


** 


.23 
.11 

.11 

.11 


ANCOES 

.21 
.13 


.19 


.34 


I: 


.05, 


.01 
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TABLE  13 

CORRELATIONS  WITH  THE  CRITERION  OF 
PEER  RANKINGS  FOR  THE  ASSIGNED  LEADER  GROUP 
EXERCISE  (lOBC,  BIOCC,  ANCOES)  AND  LEADER  GAME  (lOAC) 


Assessment 

Group 

Dimension 

lOAC 

lOBC 

BIOCC 

Ai>ICOES 

Social  Association 
Leadership 
Support  of  Leader 
Generating  esprit 

** 

.43 

** 

•  39 

* 

.24 

* 

.29 
.23 
.15 
.19 

.15 

>>7C 

.39 

** 

.27 

** 

.35 

i 

-.01 
.04 
.00 

-.17 

Problem  Comprehension 
(lOAC  only) 

.50 

Overall  Effectiveness 
(lOAC  only) 

.43 

! 

*  ** 
.05,  .01 
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Assigned  Leader  Group  Exercise 


As  £or  the  ALGsL  Peer  Rankings,  self-rankings  on  the  ALGF.  were  the 
poorest  predictor  o£  he  criterion.    0£  the  £our  dimensions  only  "generating 
esprlty"  p<  .05)  was  significant  £or  the  lOBC  assessees*    None  o£ 

the  other  assessee  groups  had  any  o£  the  dimensions  which  predicted  the 
criterion.    The  lOAC  assessees  did  not  participate  In  this  exercise* 

Leader  Game  (lOAC  Only) 

As  for  the  A88i/^3or  Ratings  and  Peer  Rankings,  he  Leader  Game  Self 
Rankings  were  the  best  self-ranking  predictors  of  the  criterion*    All  of 
the  five  dimensions  were  significant  predictors*    These  were,  "problem 
comprehension,"  "leadership,"  "support  of  leader,"  "overall  effectiveness," 
and  "generating  esprit,"  (r-.38,  ..36,  .34,  *34,  p<*01;  r-.19,  p<  *05, 
respectively) * 

4.  ENTRY  lOTERVIEW  PERFORMANCE  EVALUATION 

The  correlations  of  entry  interview  ratings  with  the  criterion  are 
presented  in  Table  14*    All  but  two  of  the  I'ourteen  dimensions  correlated 
significantly  with  the  criterion  for  at  least  one  of  the  asseseee  groups* 
"Goal  convergence,"  and  "creativity,"  did  not  produce  any  significant 
correlations.    "Fluency,"  "asset  evaluation,"  and  "llabllty  evaluation," 
successfully  predicted  the  criterion  for  three  out  o£  the  four  assessment 
groups.    "Sense  of  humor,"  "task  orientation,"  and  "task  motivation," 
correlated  significantly  for  the  lOBC  asseseee  group  only.  "Enthusiasm," 
and  "self-development,"  correlated  significantly  for  only  the  BIOCC 
assessee  group. 

5.  PENCIL  AND  PAPER  PERFORMANCE  TESTS 

The  four  test  that  fall  into  this  category  are  the  Henmon-Nelson  Test 
of  Mental  Maturity,  The  Watson-Gla»3er  Critical  Thinking  Appraisal,  the 
Nelson-Denny  Reading  Test,  and  the  Social  Insisfhe  Test* 

Since  these  tests  strongly  reflect  previous  acadmlc  achievement,  it 
is  not  surprising  that  they  correlate  highly  with  the  end-of-course  grade* 
Correlations  of  these  scores  with  the  criterion  are  in  Table  15*  Kenmon- 
Nelson  Quantitative,  Watson-Glaser  Critical  Thinking,  Nelson-Denny 
Comprehension,  Nelson-Denny  Total  and  Social  Insight  scores  were 
significant  across  all  assessee  groups.    Henmon-Nelson  Verbal ,  Henmon- 
Nelsou  Total  and  Nelson-Denny  Verbal  scores  were  significant  across  three 
of  the  four  assessment  groups.    Nelson-Denny  Reading  Rate  was  significant 
for  the  lOAC  assessee  group  only. 
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TABLE  lA 

CORRELATIONS  WITH  THE  CRITERION  FOR 
ENTRY  INTERVIEW  RATINGS 


Assessment  Group 


Dimension 

lOAC 

lOBC 

BIOCC 

ANCOES 

Sense  of  Humor 

«10 

•  IZ 

Expression  of  opinion 

.  1  J 

* 

.20 

Task  Orientation 

-.02 

.25 

.06 

.03 

Asset  Evaluation 

.32 

.19" 

** 

.29 

Liability  Evaluation 

.23 

* 

.24 

.22 

.13 

Task  Motivation 

.03 

.06 

.14 

Effectively  Conveys 
Information 

.28 

.07 

.23 

.16 

Fluency 

.25 

.11 

.27 

** 

.29 

Interest  Range 

.14 

.17 

ft* 

* 

.21 

Enthusiasm 

.02 

.11 

* 

.20 

.09 

Self -Development 
Overall  Impression 

.17 
.12 

.07 

5'C* 

.26 

.19* 

J. 

.18" 

.08 
.17 

8:1." 

r 
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TABLE  15 
CORRELATIONS  WITH  THE  CRITERION 
FOR  PAPER  AND  PENCIL  TESTS 


TEST  SCORES 


lOAC 


Assessment  Group 

lOBC  BIOCC 


Henmon-Nelson 
Quantitative 

Hentnqji-Nelson 
Verbal 

Henmon-Nelson  Total 
Score 

Nelson-Denny  Verbal 

Nelson-Denny 

Comprehension 

Nelson-Denny  Total 

Nelson-Denny  Reading 
Rate 

Watson-Glaser  Critical 
Thinking 

Social  Insight  Test 


.05,  .01 


S05  '5^5^ 


6.       SELF-DESCRIPTION  INSTRUMENTS 

Edwards  Personal  Preference  Schedule  (EFPS) 

The  EPPS  did  not  provide  a  particularly  large  number  of  correlations 
with  the  criterion  even  though  each  dimension  was  correlated  significantly 
for  at  least  one  of  the  four  assessment  groups. 

For  lOBC  assessees,  need  for  "achievement,"  correlated  positively  with 
the  criterion  (r-.23,  p^.05). 

For  the  lOAC  assessees  need  for  "achievement,"  and  "dominance," 
correlated  with  the  criterion  (t».I8,  p<.05;  r«=.34,  p  <  .01,  respectively). 
Need  for  "abasement,"  showed  an  inverse  relationship  between  that  dimension 
and  the  criterion  (r=-.24,  p^«05). 

For  the  BIOCC  assessee  groups,  needs  for  "order,"  and  "succorance"  (to 
have  others  provide  help  when  in  trouble,  to  seek  encouragement  from 
others,  etc.)  showed  an  inverse  relationship  with  the  criterion  (r«-.23,  p^ 
01;  r— .21,  p  <^.05). 

The  ANCOES  assessee  groups  had  the  largest  number  of  significant 
correlations  between  EPPS  dimensions  and  the  criterion.    Needs  for 
"abasement,"  and  "nurturance,"  (to  help  friends  when  they  are  in  trouble, 
to  assist  others  less  fortunate,  etc.)  showed  an  inverse  relationship  with 
the  criterion  (r«-.II,  p<  .05;  r=-.27,  p<.OI).    "Exhibition"  and 
"endurance,"  needs  correlated  positively  with  the  criterion  (r-.20,   .24,  p<^ 
.05,  respectively) . 

No  single  dimension  predicted  the  criterion  across  all  four  assessee 
groups.    In  fact  the  only  dimensions  that  were  significant  predictors 
across  even  two  assess  groups  were  needs  for  "achievement"  and  "abasement". 

Work  Environment  Preference  Schedule  (WEPS) 

High  scores  on  this  measure  "typify  individuals  who  accept  authority, 
who  prefer  to  have  specific  rules  and  guidelines  to  follow,  who  prefer 
Impersonalized  work  relationships,  and  who  seek  the  security  of 
organizational  and  in-group  identification."    Three  of  the  assessee  groups 
showed  significant  correlations  on  this  measure  with  the  criterion  of  end- 
of-course  grades.    For  the  lOAC,  BIOCC,  ANCOES  assessee  groups  inverse 
correlations  were  associated  with  the  criterion ^    This  inverse  relationship 
would  indicate  that  those  assessees  readily  accepting  authority  tended  to 
receive  low  end-of-course  grades  (r»-.28;  -.29,  p<.OI;  r=-.24,  p<.05, 
respsctively) ^ 
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This  tert  score  for  the  lOBC  asses^ee  group  did  not  correlate 
significantly  with  the  end-of-course-grade  criterion. 

Leader  Opinion  Questionnaire  (LOQ) 

The  LOQ  provides  two  scores:    Consideration  and  Structure*  BIOCC 
assessees  scoring  high  on  Consideration  on  the  LOQ  were  more  apt  to  receive 
a  high  score  on  the  criterion  (r*,24,  p<.01).    "Structure,  *  on  the  other 
hand  was  Inversely  correlated  with  the  criterion  (r»,34,  p<.01).    No  LOQ 
scores  were  significant  for  the  other  assessee  groups. 

Leadership  Q  Sort  (LQS) 

None  of  the  seven  almenslons  of  the  LQS  correlated  significantly  with 
the  criterion  for  the  lOBC  assessee  group. 

For  the  lOAC  assessee  gropu,  "leadership  values,"  "technical 
Information,"  and  "decision  making,"  correlated  significantly  with  the 
criterion  (r*.27,  p<.01;  r».19,  .23,  p<.05,  respectively).  Inverse 
correlations  were  obtained  for  "consideration  of  others,"  and  "mental 
health,"  (r»-.26;  -.31,  p<.01,  respectively).    This  would  indicate  that 
lOAC  assessees  higher  in  consideration  for  others  and  mental  health  would 
tend  to  have  low  scores  on  the  criterion. 

"Leadership  values,"  and  "personal  integrity"  were  LQS  dimensions  that 
correlated  significantly  with  the  criterion  for  the  BIOCC  assessee  group 
(r".25,  p<.01;  r«.17,  p<.05,  respectively). 

Two  of  the  seven  dimensions  of  the  LQS  correlated  significantly  with 
the  criterion  for  the  ANCOES  assessment  group.    These  were,  "leadership 
values,"  and  "technical  information"  (r«.22,  .22,  p<  .05). 

None  of  the  LQS  dimensions  were  significant  over  all  four  assessee 
groups.  "Leadership  values,"  was  significant  for  three  of  the  four  assessee 
groups.    "Personal  Integrity"  and  "decision  making,"  were  each  positively 
correlated  with  the  crit^^rion  for  one  assessee  group,  as  were 
"consideration  for  others"  and  "mental  health",  the  lat  .^r  two  producing 
negative  correlations  with  the  criterion.    The  dimension,  "teaching  and 
communication,"  did  not  correlate  with  the  criterion  for  any  of  the 
assessee  groups. 

Person  Description  Blank 

Fifty  pairs  of  adjectives  were  presented  to  each  assessee  (e.g.,  WARY: 
12  3  4  5  6  7:  GULLIBLE)  with  instructions  to  rate  himself  by  circling  the 
number  that  best  described  his  position  between  these  polar  adjectives* 
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Twenty-five  of  these  fifty  pairs  produced  significant  correlations  with  the 
criterion  for  at  least  one  of  the  assessee  groups.    The  pairs  of  adjectives 
and  their  correlations  with  the  criterion  for  each  assessee  group  fire 
presented  in  Table  16.     Positive  correlations  indicate  that  persons  who 
rated  themselves  higher  than  average  on  the  rightmost  adjective  were  more 
apt  to  receive  high  end-of-course  grades.    Negative  correlations  indicate 
that  persons  who  rated  themselves  higher  than  average  on  the  leftmost 
adjective  were  more  apt  to  receive  high  grades*    A  negative  correlation 
does  not  necessarily  mean  that  people  were  closer  to  the  "1"  end  of  the 
scale  than  to  the  "7"  end  of  the  scale.    It  only  indicates  that  persons  who 
were  on  the  "1"  side  of  the  overall  average  for  the  item  were  more  apt  to 
be  rated  high  on  the  criterion. 

COMPARISON  OF  DIFFERENT  CLASSES  OF  ACTR  SCORES 

Table  17  presents  summary  data  for  all  assessee  groups  for  the  six 
classes  of  ACTR  scores.     It  can  be  seen  that  the  number  of  scores  per 
assessee  (Column  1)  varied  from  9  for  the  Pencil  and  Paper  Performance 
Tests  to  75  for  the  Self-Description  Instruments.    The  assessor  time  per 
score  (Column  4)  showed  a  very  wide  variation  from  10.9  minuted  per  score 
for  Assessor  Ratings  on  Formal  Exercises  to  less  than  one  minute  per  score 
for  the  Self-Description  Instruments.     The  latter  small  time  per  score 
reflects  the  assessor  time  savings  that  resulted  from  presenting  the  Self- 
Description  Instruments  in  a  group  (six  assessees)  setting.    The  zero 
"assessor  times  per  score"  that  appear  for  Peer  Rankings  and  Self  Rankings 
reflect  the  fact  that  these  scores  were  provided  by  the  assessees  and  did 
not  require  any  additional  time  of  assessors  beyond  that  required  for  the 
assessor  ratings  on  these  exercises.     The  "assessee  time  per  score"  (Column 
6)  is  prorated  over  Assessor  Ratings,  Peer  Rankings  and  Self  Rankings. 
Thus  only  a  single  figure  is  shown  for  this  column  for  these  three 
categories.    It  can  be  seen  that  assessee  time  per  score  is  relatively  long 
for  the  Formal  ACTR  Exercises.    Assessee  time  per  score  is  longest  for  the 
Pencil  and  Paper  Performance  Tests  and  shortest  for  the  Self-Description 
Instruments. 

A  successful  predictor  is  defined  in  this  report  as  one  which  has  a 
correlation  with  the  criterion  that  is  significant  at  the  .05  level.  In 
Column  2  of  Table  3  the  average  number  of  successful  predictors  per 
assessee  is  given  and  Column  3  shows  the  percentage  that  this  is  of  the 
total  number  cf  scores  for  the  assessee.    The  assessor  ratings  of  formal 
exercises  represent  the  most  typical  ACTR  data  and  their  collection  is  the 
raison  d^etre  of  an  assessment  center.     The  high  percentage  of  predictions 
from  these  rating  scores  compared  to  interviews  and  to  questionnaires 
supports  the  assessment  center  concept. 

Perhaps  the  most  interesting  data  is  in  Column  5  where  the  assessor 
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Table  16 

PERSON  DESCRIPTION  BLANK  (PDB)  "YOURSELF"  SCORE 
CORRELATIONS  WITH  CRITERION 


Assessment  Group 


PDB  De  .  rip tor 

lOAC 

lOBC 

BIOCC 

ANCOES 

Persuasive  (1) 
Unpersuasive  (7) 

-.24(  014)* 

-.09 

03 

r  ■■ 

•  X^ 

Noncompetitive  (1) 
Comoetitive  (7) 

14 

20 C  032^* 

•  XH 

Clumsy  (1) 
nTacpffil  (7^ 

I  -  71 (  n?7^ 

07 

-  06 

-  OA 

Under s tandab le  ( 1 ) 

-  OA 

•  U3 

* 

Capable  (1) 
Incapable  (7) 

ie-k 

-.  26C-009) 

-.08 

-.01 

-.08 

Smooth  (i) 
Rough  (7) 

-.03 

-.03 

* 

.22(.012) 

.05 

Insensitive  (1) 
Sensitive  (7) 

.10 

.00 

* 

16 (.048) 

.18 

Flexible  (1) 
Rigid  (7) 

-.02 

.05 

.18(.032)* 

-.09 

Plodding  (1) 
Brilliant  (7) 

.19(.042)* 

.10 

.07  ; 

.02 

Tactful  (1) 
Blunt  (7) 

-.06 

-.05 

.03 

.29(.005) 

Tough  (1) 
(Tender) (7) 

-.21(.031)* 

.00 

.13 

.14 

Wary  (1) 

Gullible  (7) 

-.26(.008)** 

.02 

-.09 

-.11 

■         .05,  .01 
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Tableie  (cont'd) 
PERSON  DESCRIPTION  BLANK  (PDB)  "YOURSELF"  SCORE 
CORRELATIONS  WITH  CRITERION 


PDB  Descriptor 


lOAC 


Assessment  Group 
lOBC  BIOCC 


ANCOES 


blow  \x) 
Fast  (7) 

.30(. 

003) 

-.02 

.00 

.09 

Uninteilxgent  kj^j 
Intelligent  (7) 

.32(, 

•kic 

002) 

.11 

.13 

.15 

Methodical  (1; 
Creative  (7) 

-.15 

.08 

-.17(. 

042)* 

-.27(.007) 

Carerul 

Reckless  (7) 

-.02 

.08 

.21( 

018) 

* 

.23(.020) 

Funny  (1) 
Sobrir  (7) 

.11 

-.26C. 

** 

007) 

.03 

.04 

Leading  (1) 
Following  (7) 

-.10 

-.22(. 

019)* 

.02 

.07 

Shortsighted  (1) 
Farsighted  (7) 

.15 

-.23(. 

018)* 

-.03 

.06 

Mild  (1) 

Forceful  (7) 

.31( 

.002)'" 

-.05 

.12 

-.14 

Ambitious  (1) 
Complacent  (7) 

-.10 

-.22( 

V? 

.022) 

.04 

Suspicious  (1) 
Trusting  (7) 

.27( 

** 

.007) 

.10 

-.01 

.23(„023)* 

Bering  (1) 

Interesting  (7) 

-.10 

.19( 

.037)* 

-.06 

.11 

Quiet  (1) 

Talkative  (7) 

-.02 

.23( 

* 

.015) 

.10 

-.15 

Colorful  (1) 
Colorless (7) 

,03 

-.11 

.08 

.25(.015)* 

.05,  -01 

810 
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Table  17 


RESULTS  FOR  SIX  DIFFERENT  CLASSES  OF  ACTR  SCORES  - 
ALL  ASSESSEE  GROUPS  COMBINED  (END-OF-COURSE  GRADE) 


Class 

of 
ACTR 
Score 


0) 

O  0) 

a  0) 

CO  ^ 

X 

o< 

•  »-l 

o  a 


a 

E  :3  M 

:3  u-i  o 

CO  u 

CO  a  'O 

(0  a  GJ 

»-i  :3  »-i 

a  CO  P4 


:3  0) 

0)  o 

w  -u 

a  o 

U  -H 

a 

CO  M 


E 


0)  E 

O  CO 

0)  0)  a 

0)  <  M 


0) 


cn  0)  CO 


E  :5  -H 

•H  tM  E 

H  CO  ^ 
CO 

u  o 

cn  cj  4J 

cn  :3  f> 

a  CO  -H 
cn 

CO  p  a 

<  CJ  M 

0.PL4 


O 


•H  E 

H  ^ 

a  a 

a  M 

CO  o 

CO  a 

01  CO 
CO 

CO  u 

<  <u 


<u  rH  c 

E  :3  -H 
•ri  y-i  E 

CO  ^ 

CO 

a  o 
CO  a 
CO  :3 

OJ  CO 
CO 

CO  u 

<  <u 


Assessor  Ratings 
Formal  Exercises 

Peer  Rankings 
formal  Exercises 


relf  Rankings 
Formal  Exercises 


Entry  Interview 

Pencil  &  Paper 
Performance  Tests 

Self-Description 
Instruments 


68 
15.25 

15.25 

14 
9 

75 


37.00 

9.00 

3.50 

5.50 
7.50 

13.00 


5A.A1 
59.02 

22.95 

39.28 
83.33 

17.33 


1A.50 


26.64 


14.24 


0 

4.64 
2.96 

0.30 


0 

11.82 
3.56 

1.76 


4.64 
17.78 

1.83 


28.33 


11.82 
21.33 

10.54 


^eer  and  self-rankings  included  with  assessor  ratings  for  these  calculations. 
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time  per  successful  predictor  for  each  class  of  ACTR  score  Is  shown.  This 
ranges  from  slightly  less      an  2  minutes  per  successful  predictor  for  the 
Self-Description  InstrumeL.j  to  2  6  1/2  minutes  per  such  predictor  for  the 
AssessorBatlngs  of  Formal  Exercises* 

Also  of  Interest  are  the  figures  In  Column  7  of  Table  17*     xhls  Is  the 
assessee  time  per  successful  predictor.    Although  large  differences  In 
assessee  time  per  score  (Column  6)  exist,  when  successful  prediction  Is 
considered,  there  Is  not  a  great  difference  between  the  different  classes 
of  exercise*     (From  a  cost-effectiveness  view,  assessee  time  per  predictor 
Is  probably  less  critical  than  assessor  time  per  predictor*    High  assessor 
cost  was  one  of  the  main  reasons  for  termination  of  the  USAIS  ACTR.) 

Tables  18,   19,  20,  21  provide  the  data  of  Table  17  with  a  separate 
breakdown  by  the  different  assessee  groups.     In  Column  3  of  these  tables  it 
can  be  seen  that  the  percentage  of  successful  predictors  for  the  Assessor 
Ratings  ranges  from  45%  for  lOBC  assessees,  through  51%  for  lOAC  to  60%  for 
both  BIOCC  and  ANCOES* 

Table  22  represents  another  breakdown  of  the  data  in  Table  17  by 
separate  exercises. 

PREDICTION  OF  END-OF-COURSE  GRADE  VS*  PREDICTION  OF  FIELD  LEADERSHIP 
RATINGS 

Table  23  presents  the  percentage  of  successful  predictors  of  the  end- 
of-course  grade  (Column  2)  as  given  earlier  in  Table  17  and  compares  it  to 
the  percentage  of  predictors  of  the  field  leadership  criterion  used  in  the 
earlier  validation  study  of  the  USAIS  ACTR  by  Dyer  and  Hilligoss  (Column 
4)*    In  addition,  the  percentage  of  successful  predictors  of  the  end-of- 
course  grade  is  given  for  the  different  classes  of  ACTR  score  when  only  the 
assessees  who  were  included  in  the  earlier  validation  study  were  considered 
(Column  3).    This  reduced  the  number  (N)  for  lOAC  from  84  to  36,  the  K  for 
lOBC  from  87  to  45,  the  N  for  BIOCC    from  105  to  40,  and  the  N  for  ANCOES 
from  79  to  38. 

Although  this  approximate  halving  of  the  N  for  each  assessment  group 
reduced  the  number  of  successful  predictors  of  the  end-of-course  grade, 
there  were  still  more  than  three  times  as  many  successful  assessor  rating 
predictors  of  the  grade  than  of  the  field  leadership  ratings,  nearly  six 
times  as  many  successful  peer- ranking  predictors,  tw1.ce  as  many  self- 
ranking  predictors  and  three  times  as  many  successful  paper^and-pencil  test 
score  predoctors*     The  percentage  of  successful  entry-interview  and  self- 
description  Instrument  predictors  was  about  the  same  for  the  two  criteria* 
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Table  18 


RESULTS  FOR  SIX  DIFFERENT  CLASSES  OF  ACTR  SCORE: 
ICBC  ASSESSEES  (END-OF-COURSE  GRADtl) 
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^Peer  and  self-rankings  included  with  assessor  ratings  for  these  calculations 
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Table  J  9 


RESULTS  FOR  SIX  DIFFERENT  CLASSES  OF  Air?.  SCORE: 
lOAC  ASSESSEES  (END-OF-COURSE  GRADE) 
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^eer  and  self-rankings  included  with  assessor  ratings  for  these  cal.  ..eion 
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Table  20 


RESULTS  FOR  SIX  DIFFERENT  CLASSES  OF  ACTR  SCORE: 
BIOCC  ASSESSEES  (END-OF-COr.i^GE  GRADE) 
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^eer  and  self-rankings  included  with  assessor  ratings  for  these  calculations. 
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Table  21 


RESULTS  FOR  SIX  DIFFrRENT  CLASSES  OF  ACTR  SCORE: 
ANCOES  ASSESSEES  (ENI>OF-COURSE  GRADE) 


Descriptor 

No.  of  Scores 
per  Assessee 

Number  of 
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Predictors 
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^eer  and  self-rankings  included  with  assessor  ratin^^s  for  these  calculations 


Table  22 


RESULTS  FOR  SEPARATE  ACTR  EXERCISER  FOR  ALL 
ASSESSEE  GROUPS  (END-OF-COURSE 


Descriptor 

No.  Scores 
per  Assessee 

Avg,  No.  Success 
Predictors ■ 
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Predictors 

Assessor  Time 
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Score  (min) 

Assessor  Time 
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Predictor  (min) 

Assessee  Time 
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Table  22  (cont'd) 

RESULTS  FOR  SEPARATE  ACTR  EXERCISES  FOR  ALL 
ASSESSEE  GROUPS  (END-OF-COURSE  GRADE) 
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Predictors 

%  Successful 
Predictors 

Assessor  Time 
pc!r  AtiHOSsee 
Score  (min) 

Assessor  Time 
per  Successful 
Predictor  (mir 

Assessee  Time 
per  Score  (min] 

Assessee  Time 
per  Successful 
Predictor  (min] 

Entry  Interview 

14 

5.50 

39.28 

4.64 

11.82 

4.64 

11.82 

Performance  Tests 

Henmon-Nelson 

3 

2.50 

83.33 

2.22 

2.67 

13.33 

16.00 

Nelson-Denny 

4 

3.00 

75.00 

1.67 

2.22 

10.00 

13.33 

Watson-Glaser 

1. 

1.00 

100.00 

8.33 

8.33 

50.00 

50.00  1 

Social  Insight 

1. 

1.00 

100.00 

5.00 

5.00 

30.00 

30.00 

Self -Description 

Instruments 

Edwards  Personal 
Preference  Schedule 

15. 

2.50 

16.67 

0.56 

3.33 

3.33 

20.00 

Work  Environment 
Preference  Schedule 

1. 

0.75 

75.00 

1.67 

2.22 

10.00 

13.33 

Leadership  Opinion 

2. 

0.50 

25.00 

1.67 

6.67 

10.00 

40.00 

Questionnaire 

Leadership  Q  Sort 

7. 

2.25 

32.14 

1.19 

3.70 

7.14 

22.22 

Person  Description 
Blank 

50. 

7.00 

14.00 

0.02 

0.17 

0.14 

1.00 

^eer  and  self-rankings  included  with  assessor  ratings  for  these  calculations. 
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Table  23 


RESULTS  FOR  SIX  DlFFERViS:   ::  .ASHES  OF  ACTR  SCORES  - 
ALL  ASSESSSE       \PS  XMBINED 
END-OF-COURSE  GRADE  VS.  FIELD  lEADERSIP  RATINGS 
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DISCUSSMN 


Two  per^ectlves  exist  Ss^.r  discussi^on  cm  r^esf°  resuLJiB*    One  Is  i:a 
terms  of  the  weclflc  characteristics  «b  memair^d  In  the  which  pradict 

leadership  cnnxi^e  performance     f  the  cjff ferteiir  ass&ssee  g&dnpes.    The  crther 
perspective  rrrr  viewing  the*^  Results       in  terms  of  the  g^TTF^ral  question 
of  what  parte  o:c  the  ACTR  wtr^^  affectsve  In  as»essaaent  of  Isdership. 

CHARACTERISTICS  OF  SPECIFIC   ^SESSEE  GaCtlPS 

The  yoimg  Xieutenant  «ino,  following  litjt  Basic  Infantry  ^urse  rec»^d.ved 
a  high  end-of-course  score^  JudzgEsL  hiraself  cn^       less  sober  nnan  his 
collegue  who  d±d  less  well  ±ta  the  courrse*    T!his  j^^ounjg  offtcsssr  was  also  apt 
to  be  rated  higher  in  oral  ^ivi  wr^rtftii  cowamniv nftiom,  wr_rtiiis  skills, 
decision  quality,  attentioc        decaU, ,  idsiptair. aity,  adrnttnistrative  skiiu.s, 
sense  of  humor,  energetic  uu^^iiirt  af   the  teaa^  effort^  aini  ijw^rall  good 
impression  than  his  peer  wt)o  rs^iv^ai  <b  !'.ow  c<yi!irse  grade-    lie  also  was  ^ptt 
uHD  be  higher  in  reading  cCTprehesEHiaDA 

The  Captain  who  was  aoout  to  ent^r  rh    Advanced  Infantry  Course  ani 
who  later  received  a  high  end-of-cQmrsfe  j^-  ^de  kfam  apt  to  be  anmre  dominaaaci 
and  to  have  a  lower  need  for  or^er  c^m^arH*^  to  his  colleague  who  recei«tt^  a 
Low  end-of-course  grade.    He  vieaRd  hdmsfcLlf  as  more  cafsabls*  wry,  fasw.^ 
Intelligent,  forceful,  ^d  truarzzs^.  performancie  was  klgfefer  in  the-Iis!* 

Basket,  paper  and  pencil  tests,  asm  ±ae  Leader  Gam»u.    ^ong  Ott^  leadfirsincn: 
dimensions  on  which  this  ACTR  &apta::m  ^s  HSlgher,  v^ere  plaotolsi?:  abiliti«^ 
overall  effectiveness,  analysis  of  pmhlems,  fi.^rvlsion,  liieaUfirship,  awL 
decisiveness.    He  tended  to  perform  bc  .ter  lln  an  ur  structured  environmen:: 
Both  mental  health  and  consideration  Jior  otfeers  she.  *ed  an  inverse 
relationship  with  the  criterion  fnr  :riese  officers* 

The  enlisted  man  about  to  enter  tthe  fxzf  leer  Caiadidatfi  Cowcae  and  who 
received  a  high  end-of  course  grade  «5tff  apt  to  be  rated  hi^  on  the 
Leaderless  Group  Discussion  exercise,  especially  oa  oral  laamnnication.  He 
did  well  also  on  the  Writing  Exsrcis*  (gr^at^r  axjA  icDmplet=«s)  •    His  best 
exercise  was  the  Radio  Simulate,  whsre  he  tt^Alved  high  tn-  'iJiga  on  social 
skills,  communication,  and  f orc^fulm«%Q  .    Ac  with  fhe  lOAC  assessees,  this 
BIOCC  assessee  did  well  on  the  paper  ans  v^t^M  t«r^ts.    ISw  aiOCC  assessee 
who  had  high  end-of-course  grade  was  rTflpi  alljr  rated  high  oft:  forcefulness, 
decision  making,  and  use  of  infonnatxoB-     As  virth  the  succefflful  lOAC 
asTCSsee,  the  successful  BIOCC  assessee  ^.^nded  to  perform  better  in  an 
unstructured  environment. 

The  NCO  about  to  enter  the  Advanced  NCO  Caurwe,  who  later  received  a 
high  end-of-course  grade,  did  well  on  ttue  Raitio  Simulated    Dimensions  on 
which  he  did  particularly  well  were  communiiafetiQOi  skills,  adaptability. 


820 


dficlsloe  Timairf ng,  administrative  skills,  and  effectiveness  In  an 
o^anlzatlnmal  leader  role.    The  NCO  that  was  high  on  course  grades  also 
did  veil  an  the  paper  and  pencil  tests*    This  NCO  tended  to  be  Indifferent 
to  others,  and  to  lack  Imagination. 

MBDICTIVE  mLJDlTY  OF  DIFFERENT  CLASSES  OF  ACTR  SCORES 

The  peper  and  pencil  tests  provided  the  largest  proportion  of 
criterion  jaredlctors,  followed  by  the  Formal  Exercises  (Peer  Rankings  and 
then  Assessor  Ratings).    Self-Description  Instruments  had  the  smallest 
propotlon  o^  criterion  predictors. 

It  Is  not  surprising  that  the  paper  and  pencil  tests  provided  the 
largest  proportion  of  criterion  predictors  since  an  end-of-course  academic 
grade  reflects,  in  part,  the  student's  reading  and  comprehension  skills; 
factors  wfcich  weighed  heavily  in  the  paper  and  pencil  test  scores*    What  is 
of  considesrable  interest  is  that  the  traditional  staples  of  Assessment 
Centers,  1^.,  the  assessor  ratings  on  formal  exercises,  predicted  this 
course  grade  criterion  so  well*    In  the  previous  validation  study,  those 
ratings  had  had  almost  no  precictive  validity  for  the  leadership 
performance  ratings  which  was  the  criterion  measure  specifically  designed 
to  validate  the  ACTR. 

One  other  strong  contrast  exists  between  the  present  "end- of- course- 
grade"  validation  study  and  the  previous  study  using  field  leadership 
ratings  as  a  criterion  of  leadership.    ACTR  performance  often  was 
negatii«ly  correlated  with  field  leadership  ratings  of  the  NCOs*  This 
meant  -rh^t  poor  performance  on  the  ACTR  often  was  related  to  high  field 
ratingK  for  this  group.    This  applied  to  many  assessor  ratings  and 
part lend arly  to  paper  and  pencil  test  scores.    Few  such  inverse 
correlations  were  found  in  the  present  study  (using. leadership  course 
performance  as  a  criterion)  for  the  NCOs  or  for  any  other  group. 

An  explanation  of  the  failure  of  the  traditional  assessment  center 
exercises  to  predict  the  field  leadership  ratings  which  was  proposed  in  the 
earlier  validation  study  was  that  something  other  than  leadership  was  being 
rated  by  the  superiors,  subordinates  and  peers  who  provided  these  ratings. 
The  success  of  self-description  instruments  in  predicting  the  field 
leadership  ratings  suggested  that  little  ooportunity  had  existed  in  these 
peace-time  field  settings  for  leadership  to  emerge  and,  in  its  absence,  the 
leader's  self  perception  was  commnicated  to  the  raters  and  used  as  the 
basis  for  the  leadership  ratings.    In  the  present  study,  self-perception 
measures  did  much  more  poorly  than  assessor  ratings  in  predicting  the 
leadership  course  grades.    Although  performance  in  a  largely  academic 
leadership  course  may  not  be  the  best  criterion  of  leadership,  the  fact 
that  the  ACTR  formal  exercises  did  predict  this  criterion,  suggests  that  it 
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may  still  be  a  better  criterion  than  the  field  leadership  ratings.  Future 
validation  studies  will  use  actual  superior  ratings  (OERs)  that  bear 
directly  on  promotion  as  a  leadership  criterion.    Promotion  itself  will 
also  be  used  as  a  criterion  for  some  of  these  future  studies. 
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.VLIDITY  OF  ASSOCIATE  RATINGS  OF  PERFOFJL-v'l.I::  POTENTIAL  BY  A;  lY  AVIATORS 


BACKGROU: 


In  r-esponse  to  a  TRADOC  request,   the  Fort 
R2Ls..-arch  Institute  for  the  Behavioral  and  i 
ri^^Karch  to  determine  attribi:  es  which  pre-.i 
outsiranding  combat  per ..ormer-- 
thr.ir==  interrelated  tasks:  U 
az:^   ysis  of  proven  per "^ormar.. 


(1     development  of  a  : 
atxi   (3)  selection  and 
taj=i:s  1  and  2. 

Cinrrently,  no  systema 
tniining  exists.  Mar 
caiise  they  are  due  ft 
ccjnmanders  with  reli--^ 
AH-1  transition  trair 
mation,  an  improved 
The  research  reporte 
determine  the  predic 


"inp 
;alu — L : 


:ker  Field  Unit 
ial  Sciences  h. 
:\.  aviators  who 
The  effort       .s^sts  of  the 
Developmen:    .     an  attack  pil 
^'Eastman,  and  Shipley, 

-"TTi  for  asse:  ^r.eiii-  of  potent ia 
-n  of  AH-1  c=r_.in^    5  using 


the  r 


cf  uhe  Army 
s  undertaken 
jore  potentially 
11  owing 
profile  froim 
-977); 

attack  pilots; 
rindings  of 


taction  of  can:- -^a:t_^-  for  AH-1  transition 
iti— s  are  assign ~n  '  ransition  training  be- 
:.ssi--araent .     A  nt-d    .tx'i^ts  to  provide  unit 


.J  valid  instrun:snt: 
If  unit  commanders 
iviators  to  train: 
is  part  of  task  ' 
:iity  of  unit  le 


:r>  select  aviators  for 
nad  more  and  better  infor- 
assignments  would  result, 
and  was  conducted  to 
ratings  of  AH-1  candidates. 


Tr     principal  objec 
t      .'Ml-i  candidate  r 
in  the  AII-1  transit 


OBJLCTIVES 

or'  Ills  research  is 
;ua:  . 'n  form  as  a  pre 


determine  the  validity  of 
Kor  of  trainee  performance 


at:  Ah-i  (COBRA)  qualr 
.  t,  bv  means  of  assoi 
-  fr-  n  their  units. 


_'d  pilots  in  FORSCOM  units 
ze  ratings,   the  AH-1  training 


has  already  been  shown 


It  was  hypothesizec 
would  be  able  to  pr 
performance  of  avi:t: 
that  COBRA  pilots  i     rnval.y  and  attack  unics  demonstrate  high  inter-rater 
reliability  when  ev^-iuatin-  the  potential  success  aviators  in  their  units  for 
AH-1  transition  and  gunship  pilot  duties  (Eastman  and  Mcllullen,  1976). 
Th_s  study  will  determine  validity  of  the  Attack  Pilot  Candidate  Evaluation 
FoTrm  in  predicting  the  flight  and  gunnery  transition  grades  of  AH-1  students. 
An  additional  variable  of  interest  was  the  relationship  between  length  of 
ra-er-ratee  acquaintance  and  magnitude  of  the  ratings  (Freeberg,  1969; 
Lewin  and  Zwany,  1976) 


METHOD 


SAllPLE 


Ratees:     The  ratees  were  45  FORSCOM  aviators,  all  rotary  wing  qualified 
and  assigned  to  AH-1  transition  trainin^\6;tMFor t  Rucker.     The  ratees  were 
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selectsid  from  AH-1  class  rosters  if  their  unit  of  origin  was  one  with 
AH-1  aircraft  in  the  TOE.    The  units  were  selected  on  a  worldwide  bas^s 
and  are  representative  of  aviation  units  with  COBRA  pilots. 

Raters^    The  raters  were  AH-^1  qualified  aviators  from  the  units  of  tfe 
AH-1  students.    The  number  of  raters  in  the  sample  units  varied  cons±aer2.bly« 
Because  of  the  requirements  of  field  duty,  not  all  AH-1  qualified  avizstaxs 
were  available  to  evaluate  the  students  from  their  units.    However,  uz: 
systematic  basis  for  nonavailability  which  would  influence  the  resulte  of 
this  study  was  apparent. 

Procedure;    The  AH-1  transition  course  lasts  5  weeks,  and  the  classes  arre 
begun  every  two  weeks.    BegiJining  in  Oct  76  when  rosters  became  aarailah^e 
for  an  incoming  class  they  were  examined  and  students  arriving  from  un:  s 
which  were  likely  to  have  an  attack  pilot  element  were  earmarked.  Th^ 
student's  unit  was  then  contacted  to  confirm  that  a  number  of  COBRA  pilots 
were  available.    Next  a  package  of  rating  forms  was  sent  to  a  point  of 
contact  (POC)  such  as  unit  XO  or  a  senior  attack  pilot.    The  POC  then 
distributed  the  rating  forms  and  an  envelope  to  all  the  available  AH-1 
pilots  and  later  collected  them  in  sealed  envelopes  to  insure  conf idsnniality. 
Finally,  the  set  of  rating  forms  was  returned  to  ARI  in  a  mailer  prowioed  for 
that  purpose.    This  procedure  was  followed  for  all  classes  during  a  lA 
month  period  between  October  1976  to  December  1977.     It  was  necessaxry  -o 
include  this  large  number  of  classes  because  only  a  minority  of  AH*- 
students  met  the  criteria  established.    Many  of  the  students  who  could  not 
be  used  were  turnaround  Initial  Entry  Rotary  Wing  (lERW)  students  vno  had 
just  finished  flight  school.     Another  large  group  came  from  units  with  no 
COBRA  pilots. 

Rating  Scale:    The  rating  form  used  was  designed  to  have  raters  discriminate 
among  ratees  on  a  set  of  desirable  characteristics  for  attack  pilo:rs.  The 
characteristics  rated  were  identified  during  structured  interviews  of  58 
attack  pilots  with  combat  experience  at  Ft  Knox,  Ft  Hood  and  Ft  Rucker.  On 
the  evaluation  form  the  rater  (the  AH-1  qualified  pilot)  is  instructed  to 
consider  the  set  of  attack  pilot  characteristics  and  to  assign  the  AH-1 
student  a  numerical  rank,  between  1  and  25,  representing  standing  within  a 
typical  group  of  25  pilots.    The  rater  was  also  provided  space  within  which 
to  write  a  2  -  3  sentence  word  picture  justifying  the  numerical  rating 
assigned.    Additional  information  was  also  recorded  on  where  the  rating  was 
conducted  and  the  type  and  duration  of  the  relationship  between  the  rater  and 
ratee.    Detailed  instructions  were  provided,  some  of  which  only  apply  when 
the  rating  form  is  to  be  used  to  rate  a  group  of  AH-1  candidates  (see  Appendix  A). 

RESULTS  AND  DISCUSSION 

The  median  rank  order  rating  was  computed  for  each  student  from  the  set  of 
ratings  received  from  his  unit.    This  measure  was  used  to  predict  two 
criteria:     (1)  AH-1  flight  transition  grades,  and  (2)  AH-1  gunnery  grades. 
The  predictive  validity  of  the  median  rating  was  determined  by  computing 
a  Pearson's  r  between  the  predictor  and  each  criterion  grade.    The  results 
in  Table  1  show  that  the  validity  coefficient  for  ratings  on  flight  transition 
grades,  r  =  .32,  was  high  enough  to  be  useful  as  well  as  statistically 


significant  (p<.01).  By  contras'.. 
for  the  gunnery  phase  of  AH-1  txa^h 


le  lower  predictive  validity  of  ratings 
rion  is  probably  not  useful  as  a 


T—BLE  1 


CORRELATIONS  BETWEEN  T:..^io"  ION  GRADES,  GUNNERY  GRADES  AND 
THE  MEDIAN  RATINGS  F3JIZ:--D  BY  AH-1  STUDENTS  (N  =  A5) 


r2 


Variabies 

AH-1  transition  and  median  rating 
AH-1  gunnery  and  median  rating 
AH-1  transition  and  gunnery 


.32  --.Ol  .10 

.21  <.05  .04 

.33  <.01 


predictor,  r  =  .21.    The  sign^ricant  difference  between  these  two  validities 
(p<  01)  may  be  attributable  tc  differences  in  the  quality  of  grading  the 
two  phases.     During  the  flight  transition  phase,  performance  criteria  and 
IP  standardization  have  been  established  for  grading  AH-1  students.  During 
the  gunnery  stage,  grading  is  not  based  on  specified  performance  criteria, 
e.g..  accuracy  is  not  graded.     Improvements  in  gunnery  grading  criteria  are 
needed  before  this  training  performance  can  be  adequately  predicted. 

Although  the  validities  obtained  are  not  very  high,  the  predictive  validity 
of  .32  accounts  for  more  than  10%  of  the  variance  in  transition  grades  and 
will  be  useful  in  selecting  AH-1  students.    Moreover,  the  validities  reported 
are  a  very  conservative  underestimate  of  those  which  would  be  obtained 
with  an  unrestricted  population  of  AH-1  candidates.     Because  the  ratees 
had  already  been  selected  for  AH-1  transition,  it  is  reasonable  to  expect 
that  the  ratings  of  marginal  and  average  aviators  were  somewhat  inflated. 
This  was  supported  by  positive  skewing  of  the  distribution  of  ratings  which 
sueeested  the  use  of  the  median  as  a  datum.    Because  these  data  were 
obtained  by  mail,  the  number  of  ratees  was  probably  fewer  than  would  be 
possible  than  if  ratings  had  been  conducted  as  a  unit  level  operational 
procedure. 

The  criteria  grades  for  both  the  transition  and  gunnery  phases  are  not 
very  discriminating  of  training  performance  because  of  management  and  grading 
policies/practices  which  preclude  failures  and  encourage  giving  85s  to 
graduate  aviators  in  advanced  training.     Some  indication  of  this  is  provided 
by  the  means  and  standard  deviations  of  flight  transition  and  gunnery  grades 
shown  in  Table  2.    Considering  these  factors,  the  .32  validity  obtained  for 
prediction  of  gunnery  grades  is  an  encouraging  finding  in  conjunction  with  the 
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high  reliability  demonstrated  by  aviator  associate  ratings  (Eastman  and 
-McMullen)^     Properly  used  at  the  unit  level,  associate  ratings  would  provide 
a  useful  selection  tool  to  unit  commander  and  training  officers. 

TABLE  2 

MEANS  AND  STANDARD  DEVIATIONS  OF  AH-1  TRANSITION 
AND  GUNNERY  GRADE  (N  ^  A5) 


Phase  of  Instruction  X  SD 


Flight  Transition  84.13  3,04 

Gunnery  85.93  1.77 


No  significant  relationship  was  found  (r  =  .09)  between  the  length  of 
acquaintanceship  of  the  rater  and  ratee  and  the  magnitude  of  the  ratings 
given. 

A  related  AH-1  Candidate  Selection  Study  included  an  open  ended  section 
in  which  the  rater  gave  a  verbal  picture  of  the  ratees.    The  verbal  content 
of  this  section  was  analyzed  for  those  aviators  who  scored  above  average  in 
the  AH-1  transition.    The  comments  for  those  who  were  rated  high  (above  8.0), 
or  medium  (8-15),  and  low  (16-25)  are  presented  in  Table  3. 

CONCLUSIONS 

The  validity  (r  =  .32)  of  ratings  in  predicting;  AH-1  flight  transition 
training  grades  indicates  that  ratings  of  potential  transition  students 
by  COBRA  pilots  would  provide  useful  informatTon  to  unit  commanders 
and  training  officers  in  selecting  aviators  for  training.     The  true 
validity  of  ratings  is  anticipated  to  be  somewhat  higher  than  that  obtained 
in  this  study,  because  of  limitations  imposed  by  the  procedures  and 
available  sample. 

Highly  rated  good  students  were  regarded  to  be  aggressive  leaders  while 
the  low  rated  poor  students  lacked  aggressiveness  and  did  not  desire 
gunship  duties.    However,  factors  such  as  dependability  and  team  performance 
emphasized  by  raters  appear  to  contradict  the  self  reported  impulsive/ 
independence  of  the  ACE  group.     The  rater  received  a  questionnaire  to  rate 
the  student  identical  to  the  one  shown  in  Appendix  A. 
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TABLE  3 

FREpmY  OCCMIE  REMAIS  MADE  BY  RATERS  OF  TWO  EXTREME  GROUPS  OF 
AH-1  TRANSITION  STUDENTS 


High  Rated  Pilots  Who  ^ 
Obtained  Hi^h  AH-1  Grades 


Characteristics 


Dependable 


ressive 


Good  teani  worker 


Has  leadership  qualities 
Competent 


No.  of  Times 
Noted 

22 

20 

18 

16 

15 


Low  Rated  Pilots  Who 
Obtained  Low  AH-1  Grades' 


2 


Characteristics 

Lacks  aggressiveness 

Lacks  dependability 

Does  not  desire  gunship  training 

Lacks  self  discipline 

Lacks  confidence 

Poof  team  worker 

Poor  performance  as  an  aviator 


No.  of 

No 


Times 
:ed 


The  high  group  data  is  based  on  5  pilots  evaluated  by  a  total  of  46  raters. 
"The  low  group  data  is  based  on  4  pilots  evaluated  by  a  total  of  34  raters. 
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APPENDIX  A 


Complete  this  for      attack  pilot  candidate  evaluation 


Complete  this  form  only  if  you  are  AH-IG  qualified. 

Instructions: 

1.  Evaluate  this  man  in  your  unit/class  in  terms  of  your  estimate  of  his 
potential  ability  to  become  a  successful  gunship/attack  pilot.  Determine 
where  you  think  he  would  rank  in  a  typical  group  of  25  pilots  (number  1 
the  highest  ranking,  25  the  lowest  ranking).    Consider  the  ATTACK  PILOT 
CHARACTERISTICS  below  prior  to  rating  each  man.    Consider  the  entire 
group  you  are  asked  to  evaluate  and  the  following  restrictions  before 
beginning,    (a)  No  more  than  two  individuals  may  be  placed  in  1-5  column, 
(b)  no  two  individuals  will  be  assigned  the  same  rating  number.    Do  not 
rate  yourself. 

2.  Under  REMARKS,  write  a  2-3  sentence  word  picture  to  justify  the  numerical 
rating  you  assigned.    State  briefly  the  characteristics  (desirable  or  un- 
desirable) of  this  uiuM  thut  impressed  you  most. 

3.  Your  ratings  will  remain  anonymous.  The  packet  you  picked  up  has  an 
ID  number  only  to  insure  that  you  followed  the  restrictions  when  rating. 


4 


EVALUATED  INDIVIDUAL' NAME  (Last,  tirstj 

ATTACK  PILOT  CHARACTERISTICS" 


DATE 

DAY  MONTH  YEAR 
 I  i  


DESIRES  GUNSHIP  DUTIES 
TACTICAL  KNOWLEDGE 
TIMELINESS  OF  ACTION 
MECHANICAL  ABILITY 


AGGRESSIVENESS 

SELF-DISCIPLINE 

DRIVE 

EFFECTIVE  MAP  USE 


CONFIDENCE 
TEAf^ORK 
INITIATIVE 
DEPENDABILITY 


I 


CANDIDATE'S 
PRESENT  LOCATION 
(Circle  one) 

lERW 

UNIT 

TRANSITION 
TRAINING 

STANDING  WITHIN  A 
25-MAN  GROUP 
(Circle  one) 

RELATIONSHIP  TO 
CANDIDATE 
(Circle  one) 

HIS 
CO 

IP 

IN  SAME 
UNIT 

1  6     11      16     21  , 

2  7      12     17  22 

3  8      13     18  23 

4  9     14     19  24 

5  10     15     20  25 

REMARKS: 

Mhu  inMR  HAVF  y6ii  KN6wn  the  INDIVIDUAL?  YEARS  MONTHS  ^ 


RATER  ID  # 


ERIC"SAAVIIC(ARI)  Fm  1793,  1  Sep  76,  prev  ed  ob.  830 


PERFORMANCE  TEST  OBJECTIVITY:     COMPARISON  OF  INTERRATER 
RELIABILITIES  OF  THREE  OBSERVATION  FORMATS 


William  A.  Nugent  and  Gerald  J.  Laabs 


Navy  Personnel  Research  and  Development  Center 
San  Diego,  California  92152 


INTRODUCTION 

The  current  methods  for  evaluating  performance  in  the  Navy  consist  of  a 
variety  of  techniques  LhaL  ofueii  are  uot  adequately  asscssGd  in  terms  of 
validity,  reliability,  or  objectivity  prior  to  their  use  as  measures  of  job 
performance*     While  some  of  these  procedures  may  yield  useable  information 
related  to  job  performance,  the  accuracy  of  that  information  may  be  limited. 
For  example,  a  portion  of  the  performance  evaluations  conducted  in  the  Navy 
are  based  upon  a  rater's  judgment  concerning  observed  performance.     It  is 
typically  assumed  that  when  such  "hands-on"  performance  tests  are  conducted, 
the  tests  and  resultant  data  are  valid  and  reliable*     However,  if  any  ambiguity 
exists  in  terms  of  the  performance  steps  to  be  observed  and  evaluated;  reduced 
agreement  among  all  the  various  raters  is  sure  to  result,  which  seriously 
affects  the  validity  and  reliability  of  the  evaluation  procedure.     The  reduced 
agreement  that  occurs  among  raters  in  this  type  of  evaluative  situation  stems 
primarily  from  a  lack  of  test  objectivity. 

Objectivity  in  performance  testing  refers  to  the  consistency  with  which 
raters  make  their  judgments.     One  of  the  important  variables  that  directly 
affects  test  objectivity  is  that  of  test  format*    Without  specific  guidelines 
on  what  steps. or  processes  to  observe,  a  rater  is  forced  to  make  subjective 
judgments  that  are  based  on  personal  standards  and  prejudices*     Raters  should 
not  be  expected  to  evaluate  steps  they  cannot  see,  such  as  those  involved 
in  evaluating  a  mental  process,  and  each  step  should  be  clearly  stated*  When 
several  ongoing  processes  are  observed  and  evaluated  as  a  single  step,  or 
there  is  ambiguity  as  to  what  constitutes  a  performance  step,  it  becomes 
difficult  to  obtain  consistent  ratings  across  raters.     On  the  other  hand, 
the  more  structured  a  test  format  is,  the  more  the  raters  should  agree  on 
completion  of  steps  in  a  problem. 

Another  important  variable  that  may  interact  with  test  objectivity  is 
the  expertise  of  the  rater.     The  degree  of  experience  that  raters  have  with 
a  particular  piece  of  equipment  will  influence  their  judgments  of  how  others 
use  it*    Within  the  context  of  the  Navy's  Personnel  Qualification  Standards 
(PQS)  program,  for  example,  job  performance  evaluations  are  conducted  by  senior 
supervisory  personnel,  or  by  job  incumbents  that  have  successfully  passed 
the  section  to  be  evaluated.    Unfortunately,  it  is  assumed  that  when  raters 
are  qualified  in  this  manner,  questions  concerning  the  objectivity  of  the 
hands-on  performance  test  are  not  relevant. 
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Purpose 

The  primary  purpose  of  this  study  f  ^.^^^^^/li/"""!" 
agreement  and  reliability  of  ratings  obtained  »hen  three  ^'J^'J^^'i;^ 

riting  fom^ts  were  used  to  --l^'^'^^^^Hbrnt;  to' accurately  evaluate  the 

;:S"r:Lcrof1tjrrras-a1urcUon  ""L%:^1^ill  proficiency  within  a  given 


task  area, 

METHOD 


Stimulus  Materials 

appropriate  .ethod  for  determining  ^he  consistency  of  ---^^j-^^-nts^ 
is  to  hold  constant  the  behavior  to  be  observed  f vLiation 
can  be  held  constant  is  by  videotaping  ^"^^^f  ^  "^^;;;°;3^;„d '^^tlue  to  test 
in  evaluation  would        due  to  rater  or  f      -/.^^..^d  in  which  Navy 

performance  differences.     Theretore,  a  ^^^^"^  negative  DC  voltage  measure- 

Lployees  performed  four  electrxcal  ":"^""^^^'^^/.3.^^f^^e  ^^sures.  These 
ment,  positive  DC  -Uage  measur    en      an    tw    r  s^  t^^^^  ^^^^^ 

measurements  were  performed  using  a  bimpson  specifically 
and  a  Hydrotronics  Test  Signal  Generator      The  latter  devi  ^^P^^^ 
designed  to  provide  electronic  signals  for  a  previous  resea 
the  use  of  test  equipment  (Laabs,  Panell,  &  Pickering.  1977). 

The  electrical  measurement  P-blems  were  P^---^,-,,^r,,rtricar:^a:ure-  ^ 
three  times  in  the  sequence  described  ,  f  "^^^^f  3howings.     On  the  two 

ment  was  performed  correctly  on  only  one  "f^^^^^^^^/.^^^^^h  errors  that 
remaining  presentations,  the  --""^f^^^fj^^  consisted  of  12  videotaped 
varied  in  -gnitude      ^^^^^  ^eT    t  m^asireLnts  were  presented 

i:^:qu:ntLf  ^rderrw^ile  tTe^^orrect  and  incorrect  performances  were  pre- 
sen ted  randomly. 

Of  the  eight  Videotaped  se..eats  '^^-^^^f^/^-.-.^rsriirta^i'se'^^^^ 
only  six  gave  incorrect  meter  ^f^;,„^f  ^    ,ld  co^rLtly.'  The^riterion 

were  compared  to  those  of  the  ^our  segments  p 

for  the  assignment  of  a  pass  or  ^^^^J^Jned  in  these  te^problems  only.  The 
on  the  basis  of  the  meter  readings  obtained       f ^^^^^^J^^^^dural  errors  and 
two  remaining  videotaped  segments  contained  only  minor  proce 
were  not  included  in  the  present  analysis. 

..e  format  of  -  videotaped  se    en.^^^^^^^^^^^  TX'l^Z... 
TJi:  rvid:rtrp:"s:g:entVoi:r  he  steps         --e  us^  .he^^_ 
measurement  problem.     The  videotaped  segments         ^  ^^^^V^^  ^H,,,  ,he 
ment  that  the  problem  had  been  completed,  and  the  examinee  P 
final  VOM  reading  obtained. 

Rater  Evaluation  Forms  i 

one  of  three  ^i"--'  -»l""^-/°™%rec"riS  rsurerenti^'^^^^L 
^:r^r:oSsrrd'orrft?:rtrd:"ri-s"r:ctrr'edrrni  unstructured  format. 
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The  unstructured  rating  format  was  modeled  af  I  er  a  part  of  the  Personnel 
Qualifications  Standards  program.     This  fom  required  the  rater  to  evaluate 
overall  performance,  marking  a  pass  or  fail  for  each  measurement  problem  and 
recording  the  errors  detected.     No  structureid  step-by-step  procedures  were 
provided  to  make  the  evaluations,  nor  were  airy  criteria  specified  for  a  passing 
performance  on  any  problem. 

The  semi-structured  rating  form  is  similar  to  forms  the  Navy  Personnel 
Research  and  Development  Center  has  developed  and  used  in  the  past.    The  form 
was  adapted  from  a  portion  of  a  performance  testing  program  associated  with 
a  self-paced  test  equipment  course  that  is  currently  administered  at  the 
Submarine  Training  Center,  Charleston,  South  Carolina.     This  method  required 
the  rater  to  evaluate  the  videotaped  segments  against  a  number  of  structured 
areas  of  performance,  assigning  a  predetermined  weighted  value  to  each  area. 
An  area  of  performance  often  involved  more  than  one  procedural  step.  When 
the  performance  was  completed,  the  rater  summed  the  individual  point  values 
assigned  to  the  performance  areas  to  determine  whether  criterion  for  passing 
i.e.,  7.5  points  out  of  10  (or  75%  correct)  had  been  met. 

The  structured  rating  form  was  developed  specifically  for  this  study. 
It  required  the  rater  to  evaluate  the  videotape  segments  against  a  series 
of  procedural  steps,  each  consisting  of  a  single  behavior.     In  addition,  this 
form  required  that  each  step  be  performed  in  the  correct  sequence  to  receive 
a  passing  score.     The  VOM  equipment  face  was  reproduced  on  the  form  so  that 
the  position  of  control  settings,  the  location  of  lead  connections,  and  the 
final  meter  reading  obtained  could  be  easily  noted  or  marked  on  the  response 
form. 

To  develop  the  structured  rating  format,  a  preliminary  version  was  pre- 
sented to  12  Sonar  Technician  Class  "A"  School  instructors  from  the  Fleet 
Anti-Submarine  Warfare  School,  San  Diego,  and  they  were  asked  to  indicate 
each  step  of  the  procedure  that  was  mandatory  to  achieve  a  passing  performance 
for  each  problem.     The  final  version  of  the  structured  rating  form  consisted 
only  of  those  steps  that  85%  of  the  instructors  considered  essential  for 
passing.    Furthermore,  there  was  general  agreement  among  the  instructors  on 
the  sequential  order  for  the  completion  of  the  steps  that  were  retained  for 
each  of  the  four  measurement  problems. 

Rater  Expertise 

The  second  independent  variable  studied  was  rater  expertise =  Expertise 
level  was  determined  by  the  score  the  rater  obtained  on  a  VOM  proficiency 
test.  This  test  consisted  of  the  same  four  types  of  electrical  measurement 
problems  that  the  raters  were  asked  tc  evaluate  during  the  videotaped  pre- 
sentations. The  VOM  proficiency  test:  silso  used  the  identical  equipments  as 
those  used  in  the  production  of  the  videotaped  segments. 

Two  proficiency  level  categories  were  established  for  the  rater  expertise 
variable:     raters  who  passed  two  or  more  problems  out  of  four  were  considered 
to  be  high  skill  proficient;  whereas  raters  who  failed  to  pass  at  least  two 
of  the  four  problems  were  considered  to  be  low  skill  proficient.    The  struc- 
tured rating  form  was  used  by  a  member  of  the  research  staff  to  evaluate  the 
proficiency  level  of  the  raters. 
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Testing  was  conducted  in  an  experimental  laboratory  at  the  Navy  Personnel 
Research  and  Development  Center,  San  Diego.     One  half  the  rater  sample  received 
the  VOM  proficiency  test  prior  to  the  videotaped  presentations;  half  after 
viewing  the  videotape.     In  both  conditions,  raters  were  tested  individually 
on  the  VOM  proficiency  test  and  each  rater  was  assigned  to  use  one  of  the 
three  rating  forms  on  a  random  basis. 

Raters  conducted  their  evaluations  of  the  videotaped  segments  in  groups 
of  two  or  three  at  individual  television  monitor  carrels  so  that  one  rater  s 
judgment  would  not  influence  another.     Prior  to  evaluating  the  videotape 
segments,  raters  were  given  a  practice  session  to  become  familiar  with  the 
composition  of  the  videotaped  presentations  as  well  as  the  rating  format  they 
had  been  assigned.    The  raters  viewed  each  segment,  consisting  of  a  single 
electrical  measurement  problem,  only  once.    Following  each  segment  presenta- 
tion, raters  were  given  a  30-second  time  period  to  complete  entries  to  their 
rating  forms.     The  forms  were  collected  when  the  raters  had  completed  their 
evaluation  of  the  final  videotaped  segment. 

As  discussed  previously,  the  three  rating  formats  differed  from  one  another 
with  respect  to  the  process  by  which  performance  steps  were  observed  and  evalu- 
ated for  each  videotaped  segment.     However,  the  three  forms  were  comparable 
in  that  that  they  provided  raters  with  a  means  of  judging  the  overall  product 
(i.e.,  assigning  a  passing  or  failing  score  for  each  electrical  measurement  | 
problem).     Consequently,  the  criterion  by  which  the  performance  of  the  raters  ^ 
was  measured  involved  comparison  of  the  rater's  dichotomous  pass/fail  responses 
for  each  segment  to  the  predetermined  standard  for  the  10  videotaped  presenta- 
tions that  were  analyzed. 

Sample 

A  total  of  15  instructors  and  63  st-j'ients  from  the  Anti-Submarine  Warfare 
School,  San  Diego,  participated  in  the  study.     The  students  in  the  study  were 
either  designated' Sonar  Technicians  or  were  undergoing  Class  "A"  School 
training  in  that  rating. 

Of  the  78  raters  tested;  28,  26,  and  24  raters  were  assigned  on  a  random 
basis  to  t-i,->  structured,  semi-structured,  and  unstructured  format,  respectively. 
On  the  basis  of  the  VOM  proficiency  test,  16  of  the  raters  who  used  the  struc- 
tured format  were  classified  as  high  skill  proficient  and  12  as  low  skill 
proficient.     Of  the  raters  who  used  the  semi-structured  format,  14  were 
classified  as  high  skill  proficient  and  12  as  low  skill  proficient,  ^^^^lly, 
12  of  the  raters  who  used  the  unstructured  format  were  classified  as  high 
skill  proficient  and  12  were  classified  as  low  skill  proficient. 

RESULTS 

Proficiency  Test/Rating  Form  Presentation  Order 

No  differences  were  found  in  terms  of  criterion  agreement  with  the  video-  ( 
taped  presentations  as  a  function  of  whether  the  VOM  proficiency  test  was 
given  before  or  after  viewing  the  videotape.    Significant  differences  also 
failed  to  appear  in  terms  of  correct  performances  on  the  VOM  proficiency  test 
as  a  function  of  whether  the  videotaped  presentations  were  shown  before  or 
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after  the  VOM  proficiency  test.  Therefore,  for  all  remaining  analyses,  raters 
were  collapsed  across  presentation  order. 


Interrater  ReltaBtltty 

An  estimate  of  interrater  reliability  was  calculated  for  each  form  through 
application  of  the  dichotomous  pass/fail  responses  to  an  analysis  of  variance 
technique  that  yields  an  intra-class  correlation  (Winer,  1971,  p.  283).  It 
was  found  that  raters  wh.o  used  the  structured  rating  form  showed  the  highest 
interrater  reliability  with  a  coefficient  of  reliability  of  .996.     The  reli- 
ability coefficients  for  the  semi-structured  and  unstructured  formats  were 
.973  and  .808,  respectively.     These  coefficient  differences  were  tested  by 
a  chi  square  analysis  (Snede^^or  &  Cochran,  1967,  p.  286)  and  were  found  to  be 
statistically  significant  (x*^  =  ^2.4,  df  =  2,  £  <    .001).    Although  the 
structured  rating  form  had  the  highest  coefficient  of  interrater  reliability, 
the  semi-structured  and  unstructured  forms  appear  to  have  acceptable  levels 
of  interrater  reliability  in  terms  of  evaluating  overall  performance  on  the 
videotaped  segments. 

No  significant  differences  were  found  in  interrater  reliability  values 
within  each  rating  form  as  a  function  of  rater  skill  proficiency. 

Criterion  Agreement 

Table  1  provides  a  summary  of  the  mean  percent  agreement  with  the  pre- 
determined pass/fail  criterion  across  the  three  rating  formats.     The  table 
shows  that  the  use  of  the  structured  rating  format  resulted  in  the  highest 
average  percent  of  criterion  agreement,  while  the  use  of  the  semi-structured 
and  unstructured  formats  resulted  in  progressively  less  average  agreement. 


Table  1 


Mean  Percent  Agreement  with  the  Pass/Fail  Criterion 


Across  Three  Formats 


Rating  Format 


Structured 


Semi- S  true  t ured 


Unstruc tured 


M 


97.1 


80.7 


76.7 


SD 


4.6 


12.0 


14.3 


Individual  rater  percentage  values  across  all  three  rating  forms  were 
converted  to  standard  .cores,  and  an  analysis  of  variance  was  performed. 
The  tnain  effect  of  ra-ing  format  was  found  to  be  statistically  significant 
(F  (2.75)  =  26.34,  £  •    .001).    A  Scheffe  post  hoc  analysis  of  the  mean  values 
revealed  that  the  stnmrured  rating  format  differed  significantly  from  the 
semi-structured  and  usjicructured  formats  at  the  £  <   .01  level.    An  estimate 
of  the  overall  strengri  of  association  between  rating  format  and  criterxon 
agreement  was  also  calculated.     The  estimate  showed  that  39  percent  of  the 
variance  in  the  dependent  variable  can  be  accounted  for  by  the  independent 
variable. 

No  significant  differences  were  found  in  the  amount  of  criterion  agreem^i.. 
as  a  function  of  rater  skill  proficiency. 

Observation  Errors  on  Failed  Problems 

The  above  findings  clearly  indicate  that  product  judgments  (i.e.,  assigning 
pass/fail  scores)  are  best  made  using  the  structured  format.    However,  these 
data  do  not  fully  describe  the  state  of  affairs  in  using  the  different  formats 
because  they  do  not  reflect  the  errors  made  in  observing  the  processes  or 
the  procedural  steps  in  the  electrical  measurement  problems.     For  example, 
the  assignment  of  a  failing  score  that  was  in  agreement  with  the  predetermined 
criterion  could  be  made  for  the  wrong  reason.    This  might  involve  an  error 
of  omission  (failur^     o  identify  an  incorrect  procedural  step)  coupled  with 
an  error  of  commiss-.i<^  (^entifying  a  correctly  performed  procedural  step 
as  incorrect).    Alr-iough  the  three  formats  were,  by  design,  not  equivalent 
in  terms  of  the  amcict  of  information  related  to  process  judgments,  it  was 
felt  that  a  more  derailed  examination  of  the  errors  made  when  observing  the 
six  videotapes  of  iucorrect  performances  would  be  useful. 

Table  2  shows  the  average  percent  of  errors  of  omission  for  the  three 
formats.     For  the  structured  and  semi-structured  formats,  this  means  that 


Table  2 

Mean  Percent  of  Errors  of  Omission  for  Three  Formats 


Rating  Format 


Structured  Semi-Structured  Unstructured 


M  7.1 
SD  8.6 


20.2  50.5 

13.3  28.2 
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an  Incorrect  step  was  marked  as  correct  or  that  points  were  not  subtracted 
for  the  Incorrect  step,  respectively.    For  the  unstructured  format,  this  means 
that  the  error  was  not  written  down.    Unfortunately,  there  Is  no  way  of  deter- 
mining whether  the  rater  did  observe  the  Incorrectly  performed  step  but  merely 
neglected  to  enter  the  error  on  the  observation  sheet.    Thus,  the  percent 
of  errors  of     omission    for  this  format  might  be  Inflated.  Nevertheless, 
there  was  a  much  lower  percent  of  errors  of  omission  associated  with  the 
structured  format,  which  supports  the  findings  on  criterion  agreement  across 
rating  formats. 

The  other  error  that  could  occur  on  the  six  failure  trials  is  that  of 
commission.    For  the  structured  and  semi-structured  formats,  this  means  a 
correct  step  was  marked  as  incorrect  or  that  points  were  subtracted  for  a 
correct  step,  respectively.    For  the  unstructured  format,  this  means  that 
a  correct  step  was  written  down  as  incorrect.    Again,  there  is  no  way  of 
knowing  if  other  correct  steps  were  observed  as*  incorrect  but  simply  not 
entered  on  the  observation  sheet.    Table  3  shows  the  percent  of  raters  at 
both  skill  levels,  and  within  each  rating  format,  that  committed  at  least 
one  error  of  commission.    Inspection  of  the  table  shows  that  skill  proficiency 
of  the  rater  does  not  appear  to  make  a  difference  unless  the  structured  format 
is  used  to  observe  the  performance.    Overall,  the  structured  rating  format 
is  associated  with  the  lowest  percentage  of  raters  committing  errors  of 
commission  (46.4%),  with  the  semi-structured  and  unstructured  formats  showing 
much  higher  percentages  of  raters  committing  these  errors  (92.3%  and  95.8%, 
respectively) . 


Table  3 

Percent  of  Raters  Making  Errors  of  Commission  Across  Three 
Formats  and  Two  Skill  Categories 

Rating  Format 

Skill  Category  Structured  Semi-Structured  Unstructured 


High  18.8  92.9  91.7 

Low  83.8  91.7  100.0 
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CONCLUSIONS 

A  drop  from  almost  perfect  agreement  with  the  overall  pass/fail  criterion 
when  raters  used  the  structured  rating  format  to  about  77  percent  when  raters 
used  the  unstructured  format,  demonstrates  the  importance  of  providing  a  list 
of  unambiguous  step-by-step  procedures  to  be  checked-off  when  observing  hands- 
on  performance.     This  finding  is  further  reinforced  by  the  fewer  errors  of 
omission  and  commission  committed  by  this  group. 

It  is  interesting  to  note  that  the  relatively  poorer  showing  for  the  semi- 
structured  and  unstructured  formats  in  terms  of  overall  criterion  agreement, 
and  errors  of  omission  and  commision  occurred  for  both  the  high  skill  and 
low  skill  proficient  groups.     This  means  that  being  an  expert  in  a  given 
performance  area  does  not  necessarily  guarantee  that  all  steps  in  a  given 
job  task  will  be  correctly  observed  and  evaluated  by  raters  who  used  these 
performance  evaluation  forms. 

The  listing  of  unambigious  step-by-step  procedures  also  resulted  in  high 
interrater  reliability  or  objectivity  for  the  structured  rating  format.  With 
less  structure  in  the  rating  format,  there  was  less  objectivity  in  observing 
and  evaluating  both  passing  and  failing  performances.     In  addition,  the  level 
of  rater  skill  proficiency  became  more  important  on  the  structured  rating 
form  when  errors  of  commission  were  examined.     Significantly  fewer  raters 
in  the  high  skill  proficient  group  made  errors  of  commission  than  in  the  low 
skill  proficient  group  (_t  =  3.38,  df  =  26,  £<  .01). 

This  finding  suggest?  that  high  skill  proficient  raters  are  more  apt  to 
accurately  observe  and  evaluate  the  process  by  which  the  electrical  measure- 
ments were  performed.     The  failure  to  achieve  significant  differences  between 
high  and  low  skill  proficient  raters  with  respect  to  commission  errors  on  the 
two  remaining  formats  may  be  attributed  to  a  lack  of  specificity  in  the  per- 
formance steps  to  be  observed  and  evaluated.     Thus,  no  matter  what  the  format 
of  the  observation  form  to  be  used,  the  skill  proficiency  of  a  rater  should 
probably  not  be  ignored. 

The  unstructured  and  semi-structured  formats  are  presently  in  use  in  the 
Navy  to  evaluate  hands-on  job  performance.     It  is  clear  that  if  these  rating 
formats  are  replaced  by  more  structured  rating  forms;  more  reliable,  valid, 
and  objective  measurements  of  hands-on  job  performance  would  result. 
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The  development  of  measures  for  the  prediction  of  Army  officer 
performance  requires  evaluation  of  the  utility  of  these  measures  with.n 
different  samples.    Other  research  (Gilbert.  1976)  focused  on  the  valida- 
?ton  of  certain  indices  within  broadly  defined  groups.    These  groups  were 
the  Combat  Arms  branches.  Combat  Support  branches,  and  the  Service  Support 
IrJZs      This  research  was  designed  to  explore  ^^^P-J^^^^^^ f^"' 
certain  of  these  measures  in  the  Field  Artillery  as  the  beginning  of  a 
validation  of  these  predictors  in  each  of  the  Army  career  branches, 
i:    ':    aspect  involved  was  to  explore  the  possible  -Jf  J^f  ^^.^^  ^^e^ion 
major  field  of  college  study  to  performance  on  the  prediction  and  criterion 


measures 


The  first  objective  of  this  research  was  to  compare  the  performance  of 
Field  Arliillry  officers  on  certain  cognitive  and  non-cognitive  measures 
!iti  ttat  of  officers  in  the  other  Army  career  branches.    The  ^e^^^ec- 
tive  was  to  determine  the  effectiveness  of  these  measures  J"  P^^^icting 
officer  performance  early  in  their  active  duty  tour      The  ^objective 
was  to  evaluate  differences  in  performance  among  officers  who  Pursued 
different  fields  of  study  while  in  college  on  the  prediction  and  on  the 
criterion  measures. 


Procedure 


Data  were  obtained  on  610  Field  Artillery  officers  who  entered  on 
active  duty  during  the  1973  Fiscal  Year  and  who  continued  on  active  duty 
Tf    re':  Ltion  of  the  Officer  Basic  Course  (OBC).  -j/^  J^,, 

ation  Battery  (OEB)  was  administered  to  these  officers  during  the  Officer 
Bas?c  CoursZ    The  OEB  consists  of  cognitive  and  non-cognitive  measures 
the  seve^  subtests  are  Combat  Leadership  (Cognitive).  Technical-Managerial 
LeadSISp  (Cognitive),  Career  Potential  (Cognitive),  ^-bat  ^  cjership 
(Non-Cognitive).  Technical-Managerial  Leadership  (Non-Cognitive).  Career 
PotentJaf (Non-Cognitive),  and  Career  Intent.    The  description  of  the 
Ite^s  in  each  of  the  subtests  of  the  Officers  Evaluation  Battery  is  shown 
in  ?able  !!    Two  criterion  measures  were  used.    The  first  criterion  of 
m  iaoie  course  grades  in  the  Officer  Basic  Course, 

performance  used  was  tne  rinai  cour&e  gi-ciuc         ,    ,     .  „  ^,  ^  vear  of 

Officer  Efficiency  Report  (OER)  ratings  obtained  during  the  -irst  year  ot 
active  duty  were  used  as  the  second  criterion. 

iThe  views  expressed  in  this  paper  are  those  of  the  authors  and  do  not 
necelsariirreflect  the  view  of  the  U.  S.  Army  Research  Institute  or 
the  Department  of  the  Army. 
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Table  1 

Officer  Evaluation  Battery  (OEB)  Subtests  and  Description  of  Items 


SUBTEST  

Combat  Leadership  (Cognitive) 

Technical-Managerial  Leadership 
(Cognitive) 

Career  Potential  (Cognitive) 
Combat  Leadership  (Non-Cognitive) 

Technical-Managerial  Leadership 
(Non-Cognitive) 

Career  Potential 
Career  In: ent 


 DESCRIPTION  OF  ITEMS 

Military  tactics;  practical  skills 
in  a  variety  of  areas  ranging  from 
out-door  activities  to  mechanical 
and  electronic  applications. 


History,  politics;  culture;  mathe- 
matics; physical  sciences 

Technological  knowledge  relevant 
to  military  requirements. 

Combat  leader  qualities,  occupational 
interests,  sports  interest,  outdoor 
interests  related  to  combat  leader- 
ship 


Mathematics  and  physical  sciences 
skills  and  interest;  urban  or  rural 
background;  scientific  interest  and 
ability;  decisive  leader  qualities; 
and  verbal-social  leadership 

Clerical-administrative  interest, 
versus  white  collar  interest,  com- 
bat interest 

Intention  of  making  the  Army  a  career 
choice 
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The  first  analysis  involved  comparing  the  mean  performance  of  Field 
Artillery  officers  with  the  mean  performance  of  officers  in  the  other 
career  branches  on  the  seven  subtests  of  the  Officer  Evaluation  Battery 
(OEB) .    Two  analyses  of  regression  were  performed  using  the  seven  sub- 
tests of  the  OEB  as  predictors-     In  one  analysis  Officer  Basic  Course  grades 
were  the  criterion  while  in  the  other,  the  criterion  was  the  Officer 
Efficiency  Report  (OER)  ratings  earned  during  the  first  year  of  active 
duty.    The  sample  was  then  divided  into  five  groups  on  the  basis  of  major 
study  field  pursued  by  the  officers  while  in  college.    These  five  groups 
were  Humanities,  Business,  Engineering,  Physical  Sciences,  and  Social 
Studies.    Analysis  of  variance  was  used  to  evaluate  the  differences  among 
tha  five  groups  on  each  of  the  prediction  and  criterion  measures. 

Results  and  Discussion 

In  Table  2,  the  means  of  the  sample  of  Field  Artillery  Officers  are 
shown  and  the  mean  of  officers  in  other  branches  on  the  seven  subtests  of 
the  Officer  Evaluation  Battery-    There  were  not  any  differences  between  the 
means  of  the  two  groups  on  six  of  the  subtests.    The  mean  for  the  Field 
Artillery  officers  was  higher  than  for  other  officers  on  the  Career 
Potential  (Non-Cognitive)  subtest  at  the  .01  level. 

The  zero  order  correlations  between  each  of  the  subtests  of  the 
Officer  Evaluation  Battery  and  Officer  Basic  Course  final  grades  are  shown 
in  Table  3  as  well  as  the  resulting  multiple  correlation  coefficient. 
The  correlations  the  OEB  cognitive  scales  with  this  criterion  are  all 
significant  at  the  .01  level •     Two  of  the  non-cognitive  subtests, 
Technical-Managerial  Leadership  (Non-Cognitive)  and  Career  Intent  also  yield 
correlations  with  this  criterion  that  are  significant  at  the  .01  level. 
Tx.ff>  pr>Ti-or>f>TiT  i"T  ve  subtests.  Combat  Leadership  (non-Cognitive)  and  Career 
Potential  (non-Cognitive)  yielded  low  and  non-significant  correlations  with 
Officer  Basic  Course  final  grades.    All  of  the  seven  scales  of  the  OEB 
yielded  a  multiple  correlation  of  .AA  with  the  criterion  that  was  signifi- 
cant at  the  .01  level. 

When  the  zero  order  correlations  between  the  OEB  subtests  with  the 
criterion  of  197A  Annual  Average  Officer  Efficiency  reports,  shown  also 
in  Table  3,  are  evaluated  only  the  Technical-Managerial  Leadership  (Non- 
Cognitive)  subtest  yielded  a  significant  correlation  with  this  criterion. 
The  obtained  multiple  correlation  of  .14  was  significant  at  the  .01  level. 

In  Table  A,  the  means  of  the  five  different  college  majors  are  presented 
for  the  seven  OEB  subtests.     Significant  differences  among  the  five  groups 
were  obtained  at  the  .01  on  six  of  the  seven  subtests  of  the  OEB.  There 
were  not  any  differences  among  the  groups  on  the  Combat  Leadership  (Cog- 
nitive) subtest. 
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TABLE  2 

COMPARISON  OF  FIELD  ARTILLERY  OFFICERS 

WITH  OFFICERS  IN  OTHER  BRANCHES 
ON  THE  OFFICER  EVALUATION  BATTERY  (OEB) 
SUBTESTS 


Variable 


Mean 


Field 

Artillery 

(N=610) 


Other  Branches 
(N=3,947) 


Officer  Evaluation  Battery  (OEB) 

Combat  Leadership  (Cognitive) 

Technical-Managerial  Leadership 
(Cognitive) 

Career  Potential  (Cognitive) 

Combat  Leadership  (Non-Cognitive) 

Technical  Managerial 

Leadership  (Non-Cognitive) 

Career  Potential  (Non-Cognitive)**^' 

Career  Intent 


105.14 

108.36 
101.85 
108.30 

101.51 
106.87 
114.92 


103.37 

106.44 
101.90 
106.61 

102.57 
103.53 
114.53 


**Indicates  a  significant  difference  on  this  variable  at  the  .01  level. 
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TABLE  3 

CORRELATIONS  OF  THE  OFFICER  EVALUATION 
BATTERY  WITH  THE  TWO  CRITERION 
MEASURES 


Combat  Leadership  (Cognitive) 

Technical-Managerial  Leadership 
(Cognitive) 

Career  Potential  (Cognitive) 

Combat  Leadership  (Non-Cognitive) 

Technical-Managerial  Leadership 
(Non-Cognitive) 

Career  Potential  (Non-Cognitive) 
Career  Intent 
Multiple  Correlation 
**Significant  at  the  .01  level. 


Officer  Basic 
Course  Final 
Grades  (N=576) 


,34** 

.32** 
.32** 
.Ob 

.W* 
.06 
.15** 
.44** 


1974 

Annual  OER 
(N=471) 


.07 

.01 
.09 
.09 

.14** 
.01 
.08 
.21** 


ERIC 
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Table  4 

Means  for  the  Five  Groups  of  College  Majors 


Humanities    Business    Engineering    Physical  Social 

Sciences  Studies 

(N=20)  (N=106)        (N=30)  (N=269)  (N=156) 


Officer  Evaluation 
Battery 

Combat  Leadership 

(Cognitive)  97.15  101.73        106.27  107.11  104.74 

Technical-Managerial 
Leadership  (Cogni- 
tive)** 100.10  99.77        115.67  114.68  104.67 

Career  Potential 

(Cognitive)**  100.80  99.76  114.27  102.52  99.53 
Combat  Leadership 

(Non-Cognitive)**  114.25           104.65        118.90  107.21  110.30 

Technical-Managerial 
Leadership  (Non- 
Cognitive)**  103.50  96.60        112.17  102.68  100.63 

Career  Potential 

(Non-Cognitive)**  108.15  94.74        113.37  109.99  109.43 

Career  Intent  **  121.00  117.51        117.20  111.40  118.16 

Officer  Basic  Course 

Final  Grades  96.00  102.11        104.90  98.94  100.13 

1974  Annual  PER  Score        101.17  100.05         97.18  101.39  98.87 

**A  significant  difference  among  groups  on  this  variable  at  the  .01  level. 
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On  the  Technical-Managerial  (Cognitive)  subtest  the  Engineering  and 
Physical  Sciences  majors  were  favored  in  that  order  in  terms  of  average 
performance;  those  officers  who  majored  in  Business  had  the  lowest  mean 
performance  on  this  subtest.    Engineering  majors  were  favored  on  the 
Career  Potential  (Cognitive)  subtest  while  those  officers  who  majored  in 
Social  Studies  had  the  lowest  average  performance  on  this  subtest.  Those 
officers  who  majored  in  Engineering  and  in  Humanities  had  higher  average 
performance  on  the  Combat  Leadership  (Non-Cognitive)  subtest.  Engineering 
majors  were  favored  on  the  Technical-Managerial  (Non-Cognitive)  subtest 
and  on  the  Career  Potential  (Non-Cognitive)  subtests.    Those  officers  who 
majored  in  Humanities  had  the  highest  mean  performance  on  the  Career 
Intent  scale.    There  was  not  any  significant  difference  among  the  five 
groups  on  the  criterion  measures  (i.e.  Officer  Basic  Course  final  grades 
or  the  Officer  Efficiency  Report  ratings  earned  during  the  first  year  of 
active  duty) . 

Results  of  this  research  indicate  that  Field  Artillery  officers  are 
not  any  different  from  officers  in  the  other  Army  career  branches  on  the 
cognitive  and  non-cognitive  subtests  of  the  Officer  Evaluation  Battery 
(OEB)  with  one  exception.     Field  Artillery  officers  have  higher  scores  on 
the  Career  Potential  (Non-Cognitive)  subtest  of  the  OEB  which  is  essen- 
tially a  measure  of  interest  in  clerical-administrative,  manual  versus 
"white-collar",  and  combat  type  of  activities. 

The  Officer  Evaluation  Battery  (OEB)  is  a  substantial  predictor  of 
success  in  the  Officer  Basic  Course  for  Field  Artillery  officers.  The 
predictive  utility  of  the  Officer  Evaluation  Battery  is  less  when  used 
in  the  prediction  of  Officer  Efficiency  Report  (OER)  ratings  but  is  still 
significant  (as  indicated  by  a  multiple  correlation  of  .21,  significant 
at  the  .01  level) . 

Differences  in  performance  among  officers  who  pursued  different  fields 
of  college  study  on  the  Officer  Evaluation  Battery  subtests,  with  the 
exception  of  the  Combat  Leadership  (Cognitive)  subtest,  were  obtained. 
Future  research  will  utilize  this  finding  to  obtain  more  accurate  estimates 
of  the  predictive  utility  cf  the  instrument. 


.Or 
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Construct  Validity 


Brian  S.  O'Leary 
U.  S.  Civil  Service  Commission 


Introduction 

The  Professional  and  Administrative  Career  Examination 
(PACE)  is  the  examination  used  for  the  selection  of  personnel 
for  over  100  Federal  professional  and  administrative  occupa- 
tions requiring  a  college  degree  or  equivalent.     The  written 
test  portion  of  the  PACE  measures  five  abilities  which  are 
differentially  weighted  according  to  the  requirements  of  each 
occupation  to  which  they  are  applied.     The  five  abilities  mea- 
sured in  the  examination  were  selected  based  on  an  analysis  of 
the  requirements  of  the  occupations.     A  construct  validation 
model  was  used  in  the  development  of  the  written  examination. 


Construct  Validation  Model  in  the  Employment  Setting 

Few  organizations  have  used  a  construct  validation  model 
with  employment  tests.     Some  investigators  have  employed  a 
construct  model  within  a  single  occupation.     For  example^ 
Bownas  &  Heckman  (1977)  used  a  construct  model  in  developing 
a  test  for  selecting  firefighters.     To  my  knowledge^  CSC  is 
the  only  organization  which  has  used  a  construct  model  across 
occupational  groups . 

At  one  time  the  construct  model  was  not  well  accepted. 
However^  the  courts  now  give  it  equal  weight  with  the  other 
validity  models.     Moreover^  there  appears  to  be  a  definite 
change  in  the  professional  climate  concerning  construct 
validity.     In  fact^  the  American  Psychological  Association 
in  their  comments  on  the  proposed  testing  guidelines  state 
that  the  construct  validity  section  is  one  of  the  most 
important  in  the  guidelines. 

Perhaps  the  biggest  drawback  with  the  construct  model  is 
that  the  necessary  operational  steps  are  not  well  defined. 
Cronbach  and  Meehl's  (1955)  classic  construct  model  with  the 
large  nomological  nets  may  be  too  complex  for  practical  appli- 
cation.    A  form  of  Campbell's  (1960)   trait  validity  may  be  more 
appropriate  for  the  employment  setting. 

A  common  trend  in  almost  all  discussions  of  construct 
validity  involves  testing  of  hypotheses  concerning  the  con- 
struct(s)  in  question.     Is  the  construct  in  question  related 
to  measures  of  behavior  in  situations  where  the  construct  is 
thought  to  be  an  important  variable?     Procedures  for  testing 
such  hypotheses  can  vary  greatly  from  logical  analysis^  to 
correlational  studies^  to  controlled  experimental  studies. 
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PACE  Developnient 


Several  practical  testing  needs  tended  to  dictate  the 
construct  model  for  the  development  of  the  PACE.     First,  a 
single  test  was  needed  so  that  an  applicant  could  be  con- 
sidered for  more  than  one  occupation  without  taking  a  large 
number  of  tests.     Second,   it  was  hypothesized  that  many 
Federal  occupations  require  similar  abilities  even  though 
the  actual  duties  may  differ.     Third,   it  was  not  techni- 
cally feasible  to  conduct  separate  criterion-related  valid- 
ity studies  in  all  the  occupations.     Thus,  a  construct  model 
was  employed. 

The  basic  design  of  the  research  to  develop  the  PACE, 
in  simplified  form,  was 

1.  Analyze  occupations  to  determine  what  duties  are 
performed  by  journeymen. 

2.  Analyze  the  duties  to  determine  what  abilities  are 
important  for  performing  the  duties. 

3.  Select  test  parts  which  measure  these  abilities. 

4.  Develop  a  system  of  differentially  weighting  the 
test  parts  according  to  occupation  requirements. 


Selection  of  Occupations  and  Identification  of  ^utj^^Per formed 

The  first  step  in  the  development  of  the  PACE  written 
test  was  to  identify  the  occupations  for  which  the  test  would 
be  used.  From  the  pool  of  approximately  120  occupations  to  be 
covered  in  the  PACE,   twenty-seven  occupations  which  had 
accounted  for  approximately  70%  of  the  placements  in  previous 
years  were  selected  for  study. 

The  Civil  Service  Commission  classification  standards 
for  these  27  occupations  were  then  analyzed  to  determine  the 
duties,  or  major  job  components,  performed  by  incumbents 
working  at  the  journeyman  or  full  performance  level  within 
each  occupational  series.     These  duties  were  reviewed  and 
refined  by  subject  matter  experts.     Six  to  20  duties  were 
identified  for  each  occupation. 


Selection  of  Abilities  to  be  Measured 

A  tentative  listing  of  the  knowledges,  skills,  abilities, 
and  other  characteristics  (KSAO's)  that  were  judged  to  be  re- 
quired in  these  occupations  was  developed.     The  KSAO  list  was 


based  on  a  review  of  the  classification  standards.     The  list 
included  KSAO's  that  had  been  described  in  psychological 
literature  as  underlying  successful  job  porformance  and 
KSAO's  that  experience  with  Federal  testing  had  shown  to  be 
related  to  successful  job  performance.     Through  a  review  of 
the  literature,  six  of  these  abilities  were  identified  as 
having  potential  for  inclusion  in  the  written  test  portion 
of  the  PACE. 


Development  of  Weighting  System 

Subject  Matter  experts  (generally  supervisors)  in  each 
of  27  Occupational  series  rated  the  duties  performed  in  their 
series  for  their  importance  to  successful  performance  in  the 
occupation  and  for  the  relative  amount  of  time  that  journeymen 
spend  on  each  duty.     A  total  of  1,241  subject  matter  experts 
rated  the  duties.     These  persons  also  rated  the  abilities  for 
their  importance  for  successful  job  performance. 

Six  Civil  Service  Commission  psychologists,  experienced 
in  the  use  of  tests  for  employee  selection,  rated  the  import- 
ance of  each  of  the  six  PACE  abilities  for  measuring  the  per- 
formance of  each  duty  for  each  of  the  27  occupations. 

For  each  occupation,  the  duty  importance  and  time  spent 
ratings  obtained  from  the  subject  matter  experts  and  the 
ability  importance  ratings  obtained  from  the  psychologists 
were  used  to  weight  the  abilities  to  be  measured  by  the  sub- 
tests of  the  battery.     Scores  on  the  PACE  subtests  were  mul- 
tiplied by  the  weights,  and  the  sum  of  the  products  used  to 
rank  order  competitors  for  an  occupation. 

Seven  weighting  patterns  emerged  for  all  27  occupations. 
One  ability  (lon>7-term  memory)  was  eliminated  since  the  test- 
ing literature  did  not  contain  any  tests  suitable  for  use  in 
a  short-term  testing  session,     when  this  test  was  eliminated, 
six  weighting  patterns  emerged  for  the  27  occupations,  two 
of  the  weighting  patterns  covering  23  of  the  occupations. 


Development  of  the  Ability  Measures 

Literature  in  the  field  of  psychometr ics  was  reviewed  in 
order  to  find  ways  to  measure  the  abilities.     The  most  impor- 
tant sources  of  suitable  tests  were  the  works  of  French  (1951) 
and  French,  Ekstrom,  and  Price  (1963).     Utie  questions  developed 
for  the  PACE  correspond  to  the  question  types  contained  in  these 
works.     The  major  differences  between  the  French  question  and 
the  PACE  questions  lies  in  the  modifications  made  to  develop  a 
selection  instrument  which  could  be  objectively  scored  by 
machine. 
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Criterion-Related  Validity  Stud ies 


As  soon  as  the  PACE  written  test  was  constructed,  fo^llow- 
up  research  was  begun  to  further  develop  the  empirical  base 
for  technical  support  of  the  test  and  of  the  entire  system  of 
relating  abilities  to  job  duties.     What  we  are  testing  with 
the  criterion-related  studies  is  a  system  of  identifying  and 
weighting  ability  constructs  which  underlie  job  performance. 
The  criterion-related  validity  studies  are  performed  to  test 
out  the  system.  If  the  criterion-related  validity  studies  demon- 
strate empirically  that  abilities  do  indeed  underlie  job  per- 
formance this  lends  support  for  the  entire  system.     It  is  then 
not  necessary  to  perform  criterion-related  validity  studies 
in  each  specific  occupation  included  in  the  examination. 

A  series  of  studies  was  planned,   in  which  test  scores  of 
job  incumbents  were  to  be  related  to  the  scores  of  the  same 
incumbents  on  certain  specifically  prepared  measures  of  job 
performance.     The  basic  design  of  these  studies,  for  each 
occupation  studied  can  be  outlined  as  follows: 

1.  Determine  what  journeymen  do  on  the  job  -  that  is, 
conduct  a  job  analysis. 

2.  Use  the  job  analysis  to  develop  measures  of  job 
performance. 

3.  Determine  the  statistical  relationship  between  per- 
formance on  the  test  and  performance  on  the  job. 
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Occupations  Stud ied 

Social  insurance_Claims_Examining .     The  Social  Insurance 
Claims  Examining  occupation  is  unique  to  the  Social  Security 
Administration.     Employees  within  this  occupation  evaluate 
claims  for  retirement  and  health  insurance,  calculating  ap- 
plicable  rates  of  annuity  after  the  c^^.^        approved  and  as 
benefits  are  increased  by  change  in  the  Social  Security  laws. 
Claims  Authorizer  is  the  title  for  the  most  complex  job  type 
within  the  occupation.     Claims  author izers  work  only  on  the 
initial  claim,  evaluating  its  legitimacy  and  calculating  the 
amount  of  benefits  to  be  paid. 

Internal  Revenue  Officers.     Internal  revenue  officers  in- 
vestigate delinquent  taxpayer  accounts,  both  individual  and 
corporate.     The  revenue  officer  must  secure  and  analyze  finan- 
cial information  such  as  profit  and  lofs, statements,  sales 
and  expense  figures,  or  market  value  of  taxpayer's  property. 

■  s^s 


Revenue  officers  are  empowered  to  institute  levies ,  attach 
taxpayers*   income,  and  seize  and  sell  taxpayers'  property. 
Before  resorting  to  such  enforced  collection  action ,  revenue 
officers  explore  alternative  methods  such  as  arranging  for 
installment  payments ^ 

Customs  Inspection .     The  mission  of  the  Customs  Service 
is  to  assess  and  collect  customs  duties  on  imported  merchan- 
dise, to  prevent  fraud  and  smuggling,  and  to  control  carriers, 
persons,  and  articles  entering  and  departing  the  United  States. 
Customs  enforces  its  own  as  well  as  some  400  laws  and  regula- 
tions for  40  other  Federal  agencies.     The  primary  function  of 
the  customs  inspector  is  to  process  people  and  merchandise 
coming  into  the  U.  S.,  to  protect  the  revenue  against  fraud 
and  theft,  and  to  keep  items  harmful  to  our  welfare  out  of 
the  country.     Customs  inspectors  work  at  airports,  seaports, 
and  border  points  processing  passengers  and  cargo. 


Job  Analysis 

In  each  occupation  a  detailed  job  analysis  was  conducted 
through  the  use  of  a  task  inventory,  a  listing  of  the  tasks 
performed  by  job  incumbents.  Journeymen  in  each  occupation 
identified  the  tasks  performed  in  these  occupations .  Claims 
author izers  identified  528  tasks ,  internal  revenue  officers 
identified  260  tasks ,  and  customs  inspectors  identified  494 
tasks. 

Journeymen  were  then  asked  to  indicate  whether  or  not 
they  performed  each  task  and  to  indicate  the  relative  amount 
of  time  spent  on  each.     This  rating  was  made  on  a  seven  point 
relative-time-spent  scale  ranging  from  "very  much  below  aver- 
age" to  "very  much  above  average." 

Responses  to  the  task  inventory  were  analyzed  by  means 
of  the  Comprehensive  Occupational  Data  Analysis  Program  to 
determine  the  relative  amount  of  time  spent  in  performing 
each  task  by  all  journeymen.     The  relative  amount  of  time 
spent  in  performing  each  task  is  a  measure  of  its  relative 
importance.     An  additional  analysis  was  performed  in  the  cus- 
toms inspector  and  claims  authorizer  samples  to  determine  if 
all  journeymen  in  the  sample  were  performing  similar  tasks. 


Measures  of  Job  Performance 

Results  from  the  task  inventory  were  used  in  the  devel- 
opment of  the  measures  of  job  performance  for  each  occupation. 
Four  measures  of  job  performance  were  developed. 
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Job  Information  Test«     In  each  study  the  job  information 
test  was  a  multiple  choice  test  requiring  one  hour  to  complete 
Items  for  the  tests  were  developed  by  subject  matter  experts 
in  the  field  and  were  designed  to  measure  the  job  knowledge 
required  to  perform  the  duties  on  which  the  journeymen 
spend  the  greatest  amount  of  time. 

Work  Samples.  Work  samples  are  designed  to  be  relevant 
approximations  to  the  work  actually  performed  on  the  job.  In 
the  claims  examiner  study  the  work  sample  consisted  of  a  stand 
ardized  claim  which  had  to  be  adjucated.     The  claims  examiner 
was  instructed  to  treat  the  claim  as  one  that  he  would  receive 
during  the  performance  of  his  regular  duties  and  to  take  the 
necessary  appropriate  action  that  he  would  normally  take. 

The  work  sample  in  the  internal  revenue  officer  study 
consisted  of  five  taxpayer  delinquent  accounts  in  which  the 
revenue  officer  had  to  make  various  collection  decisions 
(e.g.,  seize  property,  levy).     The  case  folders  contained 
sufficient  information  to  make  the  necessary  collection  de- 
cisions and  closely  resembled  the  actual  case  folders  used 
in  the  Internal  Revenue  Service. 

For  the  customs  inspector  study  a  novel  videotype  simu- 
lation was  developed.     Four  sequences  of  customs  activities 
were  shown  (e.g.,  passenger  processing,  vessel  clearance, 
search,  seizure,  and  arrest).     Upon  completion  of  each 
sequence  the  customs  inspectors  were  required  to  complete 
appropriate  customs  documents,   identify  mistakes  made  during 
the  televised  sequence,  and  recommend  proper  performance. 

Each  work  sample  required  one  hour  and  fifteen  minutes 
to  complete. 

Supervisory  Rating  Form.     The  supervisory  rating  form 
was  a  tailor-made  rating  form  designed  to  record  a  first- 
level  supervisor's  rating  of  the  performance  of  the  subor- 
dinate journeymen.  The  rating  scales  were  developed  to  cor- 
respond to  the  duties  identified  in  the  task  analysis.  Each 
supervisor  rated  his  journeymen  on  different  categories  of 
performance  for  each  of  the  major  duties  identified  in  the 
task  inventory.  Scale  points  describing  effective  and  inef- 
fective performance  were  developed  for  each  scale  on  the 
rating  form. 

Supervisory  Ranking  Form.     The  supervisory  ranking  form 
contained  the  same  description  of  the  job  duties  as  the 
supervisory  rating  form  but  contained  no  scale  points 
describing  effective  and  ineffective  performance.  Each 


supervisor  had  to  rank  his  subordinates  with  respect  to  each 
the  major  duties  identified  for  each  occupation.     This  cri- 
terion measure  was  not  used  in  the  internal  revenue  officer 
study. 

Success  in  Training.     Training  success  measures  were 
available  for  a  sample  of  claims  examiners.     Training  suc- 
cess was  measured  by  averaging  five  training  performance 
measures  administered  during  the  five  phases  of  the  train- 
ing program.     These  training  performance  measures  included 
actual  work  samples  (i.e.,  working  on  actual  disability 
claim)  in  addition  to  the  traditional  multiple-choice  type 
questions. 


Research  Participants 

Two  hundred  and  thirty  one  claims  authorizers,  305  in- 
ternal revenue  officers,  and  190  customs  inspectors  at  vari- 
ous locations  throughout  the  U.  S.  were  administered  the  PACE 
and  the  criterion  instruments.     The  total  testing  time  for 
each  participant  was  approximately  8  hours. 


Relationship  Between  PACE  and  job  Performance 

The  total  score  on  the  PACE  test  was  significantly  related 
to  job  performance  as  measured  by  all  the  measures  of  job  per- 
formance for  the  claims  authorizer  and  internal  revenue  officer 
studies.     For  the  customs  inspector  occupation,  PACE  scores  were 
significantly  related  to  performance  on  the  job  information  test 
and  the  work  sample  but  not  the  supervisory  ratings  and  rankings. 
The  pattern  of  validity  coefficients  was  similar  across  occupa- 
tions with  a  median  coefficient  of  .40.     These  results  indicate 
that  persons  who  score  high  on  the  PACE  tend  to  perform  better 
on  the  job. 

Comparisons  were  also  made  of  different  procedures  for 
weighting  the  subtests  of  the  PACE.     The  construct  weights 
which  are  being  used  operationally  produced  validities  that 
were  essentially  as  high  as  those  obtained  by  other  weighting 
procedures . 

The  correlation  obtained  between  PACE  and  training  success 
for  claims  examiners  indicates  that  PACE  is  a  valid  predictor 
of  training  success. 

These  highly  consistent  results  provide  further  support  for 
the  construct  validity  of  the  weighting  system  used  in  the  de- 
velopment of  the  PACE. 
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The  purpose  of  our  research  program  on  validity  generalization 
has  been  to  test  one  of  the  orthodox  doctrines  of  personnel  psychol- 
ogy:   the  belief  In  the  situational  specificity  of  employment  test 
vllldltles  (Schmidt  &  Hunter,  1977).    This  belief  has  been  founded 
on  the  empirical  fact  that  considerable  variability  Is  observed  from 
study  to  study  In  raw  validity  coefficients  even  when  Jobs  and  tests 
appear  to  be  similar  or  essentially  Identical  (Ghlselll,  1966;.  The 
explanation  that  has  developed  for  this  variability  Is  that  the  factor 
structure  of  job  performance  Is  different  from  job  to  job  and  that  the 
human  observer  or  job  analyst  Is  simply  too  poor  an  information 
receiver  and  processor  to  detect  these  subtle  but  important  differences. 
Until  recently,  most  industrial  psychologists  accepted  this  explanation 
and  concluded  that  empirical  validation  is  required  in  each  situation, 
and  that  validity  generalization  is  essentially  impossible  vAlbrlght, 
Glennon,  &  Smith,  1963,  p.  18;  Ghlselll,  1966,  p.  28;  Gulon,  1965, 
p.  126).    Our  work  has  tested  the  hypothesis  that  the  outcomes  of 
validity  studies  within  job-test  combinations  is  due  to  statistical 
artifacts,    xnis  presentation  first  describe-  the  validity  generaliza- 
tion model  used  to  test  this  hypothesis  and  then  describes  the  model  s 
application  to  clerical  tests  and  jobs. 

Figure  1  shows  how  various  statistical  artifacts  might  act  to  pro- 
duce the  appearance  of  wide  variability  in  validities  when  in  fact  none 
really  exists.  This  figure  shows  what  the  observed  variability  in 
validity  coefficients  across  studies  would  be  if  in  fact  the  true  score 
correlation  between  test  and  criterion  were  equal  at  .60  in  each  set- 
ting and  all  variability  in  results  from  study  to  study  were  due  solely 
to  various  statistical  artifacts. 

The  first  distribution  in  Row  1  shows  the  variability  to  be  ex- 
pected if  only  the  artifact  of  differences  between  studies  in  criterion 
reliability  were  operating.    The  distribution  of  criterion  reliabilities 
assumed  is  shown  in  Table  1. 

The  second  distribution  shows  variability  due  solely  to  differ- 
ences between  studies  in  test  reliability.  The  distribution  of  test 
reliabilities  assumed  is  shown  in  Table  2. 

The  third  distribution  in  Row  1  shows  variability  due  solely  to 
differences  between  studies  in  degree  of  range  restriction.  Range 
restriction  values  used  in  the  computations  are  shown  in  Table  3. 

The  single  distribution  in  Row  2  shows  the  variability  produced 
by  the  three  artifacts  in  Row  1  operating  simultaneously.    Even  though 
we  have  not  yet  Introduced  sampling  error,  it  is  obvious  that  observed 
variability  from  study  to  study  is  already  substantial.    The  distributions 
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in  Row  3  show  how  artifactual  variance  increases  still  further  when 
ordinary  sampling  error  is  added.    The  three  distributions  illustrate 
expected  variability  when  studies  are  all  based  on  sample  sizes  o^ 
50.  100,  and  150,  respectively.    The  distributions  based  on  N  s  of 
50  and  100  are  probably  the  most  realistic.    These  standard  deviations 
are  very  similar  to  empirically  observed  standard  deviations,  as  we 
will  see  in  this  study. 

Figure  1  illustrates  the  effects  of  only  four  artifactual  sources 
of  variance: 

1.  differences  between  studies  in  criterion  reliability; 

2.  differences  between  studies  in  test  reliability; 

3.  differences  between  studies  in  range  restriction;  and 

4.  sampling  error  (i.e.,  variance  due  to  N  <  "  ). 

There  are  at  least  three  additional  artifactual  sources  of  variance: 

5.  differences  between  studies  in  amount  and  kind  of  criterion 
contamination  and  deficiency; 

6.  computational  and  typographical  errors;  and 

7.  slight  differences  in  factor  structure  between  tests  of  a 
given  type  (e.g.,  arithmetic  reasoning  tests). 

The  full  variance-components  model  resulting  when  all  of  the  above 
sources  of  artifactual  variance  are  considered  is  outlined  in  Appendix 
A. 

How  could  one  test  the  hypothesis  of  situational  specificity  with 
real  data?    Conceptually,  this  test  is  quite  simple.    Suppose,  for 
example,  a  researcher  had  100  validity  coefficients  relating  tests  of 
perceptual  speed  to  proficiency  in  clerical  work.    He  or  she  need  only 
convert  the  validities  to  Fisher's  z,  compute  the  variance  of  this 
distribution,  and  subtract  variance  due  to  each  of  the  artifactual 
sources  from  this  total  variance.    If  one  finds  that  artifacts  account 
for  all  or  essentially  all  of  the  variance,  the  hypothesis  of  situa- 
tional specificity  is  rejected.     If  this  is  the  case,  validity  gen- 
eralization  is  obviously  no  longer  a  problem,  since  the  observed  vari- 
ation in  validity  results  will  have  been  shown  to  be  a  result  of  the 
operation  of  statistical  artifacts. 
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Method 


Ck)mpilation  of  Validity  Distributions 

The  process  of  coaplling  a  data  base  of  sufficient  scope  and  size 
to  permit  a  large-scale  test  of  the  model  was  undertaken  in  two  stages: 
first,  we  developed  a  classification  and  coding  system  that  would 
enable  us  to  capture  all  potentially  relevant  data  from  validity  studies; 
second,  we  made  an  extensive  search  of  published  and  unpublished  valid- 
ity studies  and  recorded  the  information  in  these  studies  according  to 
our  coding  system*    We  selected  clerical  occupations  as  one  of  our  ini- 
tial areas  of  investigation  because  of  the  large  number  of  validity 
studies  that  have  been  conducted  on  such  occupations* 

Tests  were  classified  using  a  system  partially  adapted  from  Ghiselli 
(1966,  pp*  15-21)  and  Dunnette  (Note  1).    This  system  is  shown  in  Appen- 
dix B«    Ten  general  categories  of  test  types  were  established,  most  of 
which  represent  a  construct  or  ability  factor  found  in  the  psychometric 
literature  (e.g.,  verbal  ability,  quantitative  ability,  perceptual 
speed).    Categories  for  general  intelligence  tests  (consisting  of  verbal, 
quantitative,  and  abstract  reasoning  or  spatial  ability  components), 
so-called  "clerical  aptitude"  tests  (consisting  of  verbal,  quantitative, 
and  perceptual  speed  components),  performance  tests  (e.g.,  typing  or 
dictation  tests),  and  motor  ability  tests  (consisting  of  various  types  of 
finger,  manual,  and  arm  dexterity  tests),  were  included  because  of  their 
relatively  conmon  use  in  clerical  selection,  even  though  they  do  not 
represent  pure  constructs  in  the  factor  analytic  sense.    Within  each 
general  test  type  category  codes  were  developed  for  the  specific  item 
types  most  commonly  used  as  measures  of  that  factor  or  test  type  (e.g., 
the  verbal  ability  test  type  category  included  such  item  type  categories 
as  reading  comprehension,  vocabulary,  grammar,  spelling,  and  sentence 
completion). 

Clerical  jobs  were  classified    using  a  slightly  modified  version 
of  the  Dictionary  of  Occupational  Titles  (DOT)  classification  system 
(U.S.  Department  of  Labor,  1965;  Pearlman,  Note  2).      This  coding  scheme 
is  shown  in  Appendix  C.    Under  this  system  clerical  jobs  are  grouped 
into  five  "true"  job  family  categories  (DOT  occupational  divisions  20,  21, 
22,  and  23,  plus  job  groups  240-243  of  occupational  division  24),  one 
'Hnlscellaneous"  category  (DOT  job  group  249),  and  two  additional  categories 
developed  to  handle  clerical  occupations  which  were  not  sufficiently 
specified  in  the  original  study  to  permit  definitive  classification,  and 
samples  representing  two  or  more  different  job  familj.es. 
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We  collected  data  only  from  studies  which  met  certain  minimum 
requirement*,  including  the  reporting  of:    1)  validity  results  in  the 
form  of  a  bivariate  correlation  coefficient  uncorrected  for  either  at- 
tenuation or  range  restriction;  2)  sufficient  information  to  classify 
the  test  and  job  studied;  3)  sample  size;  and  4)  sufficient  information 
to  classify  the  criterion  as  a  measure  of  either  job  proficiency  or 
training  success.    Data  from  studies  using  such  administrative  cri- 
teria as  turnover,  absenteeism,  and  tardiness  were  not  included. 

The  data  collection  process  included  an  extensive  search  for  both 
published  and  unpublished  validity  studies  of  clerical  jobs.    In  addi- 
tion to  a  thorough  search  of  the  published  literature,  we  reviewed 
most  of  the  major  commercial  test  manuals  for  validity  information, 
utilized  computer  search  services,  called  and  wrote  test  publishers 
to  obtain  unpublished  validity  data,  and  contacted  research  groups, 
private  consulting  firms,  individual  psychologists,  and  government  and 
military  personnel  psychologists.    We  ultimately  succeeded  in  locating 
3,300  validity  coefficients  for  a  variety  of  clerical  jobs  and  tests. 
These  represented  669  independent  samples.    Approximately  two-thirds 
came  from  unpublished  studies.    Of  the  3,300  coefficients,  2,718  are 
based  on  overall  job  proficiency  or  performance  criteria  and  582  are 
based  on  criteria  of  training  success.    Analysis  of  the  validities 
based  on  training  criteria  is  not  included  in  this  study. 

Data  Analysis 

The  validity .data  were  keypunched,  entered  into  a  computer  file, 
and  sorted  into  frequency  distributions  according  to  the  job  and  test 
type  categories  into  which  they  had  been  classified.    The  distribution 
of  validity  coefficients  across  the  eight  job  categories  and  ten  test 
types  is  shown  in  Appendix  D.    Within  the  five  categories  of  "true" 
job  families,  33  validity  distributions  were  sufficiently  large  to  per- 
mit analysis. 

To  compute  the  mean  and  variance  of  each  of  our  empirical  validity 
distributions,  each  coefficient  was  converted  to  ^^sher's  x  form  and 
weighted  by  its  associated  sample  size  to  produce  more  accurate  esti- 
mates of  these  two  parameters.    The  correction  for  variance  due  to 
sample  size  was  thus  a  weighted  average  of  the  sampling  error  across 
studies,  i«e., 
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The  information  necessary  to  determine  actual  values  of  cri- 
terion reliability,  t«st  reliability,  and  range  restriction  is  not 
presented  in  the  vast  majority  of  research  studies  (Jones,  1950). 
Thus  one  must  rely  on  reasonable  assumed  distributions  of  these  effects. 
The  distributions  of  criterion  reliabilities,  test  reliabilities,  and 
range  restriction  effects  assumed  in  this  study  are  those  shown  in 
Tables  1,  2,  and  3,  respectively.    These  distributions  are  probably 
somewhat  conservative  (Schmidt  &  Hunter,  1977),  leading  to  under- 
estimates of  variance  due  to  these  three  statistical  artifacts. 
Criterion  reliabilities  are  for  job  performance  or  proficiency  mea- 
sures, not  measures  of  success  in  training.    The  model  used  in  the 
present  study  is  an  improvement  over  the  model  used  in  Schmidt  and 
Hunter  (1977);  unlike  the  earlier  model,  the  present  model  includes 
a  correction  for  variance  due  to  between-study  differences  in  test 
reliability. 

The  procedure  by  which  we  computed  estimates  of  variance  due  to 
between-study  differences  in  criterion  reliability,  test  reliability, 
and  range  restriction  effects  for  each  validity  distribution  are  pre- 
sented in  Appendix  A.    After  computation,  all  four  estimates  of  arti- 
factual  variance  (the  above  three  sources  plus  variance  due  to  sampling 
error)  were  subtracted  from  the  observed  variance,  providing  the  final 
estimate  of  true  situational  variance,  i.e.,  variance  due  to  true 
differences  between  jobs  in  the  factor  structure  of  performance. 

No  corrections  have  been  made  in  our  research  for  differences  be- 
tween studies  in  amount  and  kind  of  criterion  contamination  or  defi- 
ciency, for  computational  and  typographical  errors,  or  for  slight  dif- 
ferences between  tests  in  factor  structure  because  it  is  difficult  if 
not  impossible  to  estimate  their  effects.    However,  not  correcting  for 
these  sources  of  error  insures  a  conservative  procedure,  i.e.,  the  cor- 
rected variance  tends  to  overestimate  rather  than  underestimate  true 
variance. 

Results  and  Discussion 

Table  A  compares  the  empirically  observed  standard  deviations  of 
the  33  validity  distributions  with  the  standard  deviations  predicted 
solely  on  the  basis  of  test  and  criterion  unreliability  effects,  range 
restriction  effects,  and  sampling  error.    Also  shown  is  the  percent  of 
observed  variance  in  each  distribution  accounted  for  by  these  four 
artifacts,  and  the  total  sample  size  and  number  of  validity  coefficients 
on  which  each  distribution  is  based. 

In  10  of  the  33  cases,  the  predicted  standard  deviations  are 
slightly  larger  than  the  observed  standard  deviations.    These  are  ex- 
actly the  type  of  results  we  would  expect  if  the  situational  spec- 
ificity hypothesis  is  false.    Within  a  given  set  of  validity  distribu- 
tions representing  a  variety  of  job  family-test  type  combinations  there 
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are  likely  to  be  some  distributions  in  which  the  three  unassessed 
sources  of  variance  are  present  to  varying  degrees  and  others  in 
which  these  sources  are  negligible.     In  distributions  of  the  former 
type  we  would  expect  the  predicted  standard  deviation  to  fall  below 
the  observed  standard  deviation  to  varying  degrees.     In  distri- 
butions of  the  latter  type  the  predicted  standard  deviation  would  be 
expected  to  fall  slightly  below  the  observed  standard  deviation 
about  half  the  time  and  to  slightly  exceed  the  observed  standard 
deviation  about  half  the  time  as  a  result  of  minor  differences  be- 
tween the  actual  artifactual  effects  and  our  estimates  of  then. 

Considering  these  distributions  together,  in  only  five  of  the  33 
cases  is  the  percentage  of  observed  variance  accounted  for  less  than 
half,  and  in  only  one  case  is  it  less  than  40  percent.    The  average 
amount  of  variance  accounted  for  is  75  percent.    This  means  that,  in 
general,  the  variance  left  within  which  situational  specificity 
(situational  moderators)  can  operate  is  extremely  limited.    For  many  of 
the  distributions,  no  variance  is  left.    In  20  of  the  33  distributions, 
^^^A  thnrx  70  n^rrpnt  of  thp  observed  Variance  is  accounted  for. 

If  we  look  only  at  the  true  constructs— eliminating  motor  ability 
tests,  performance  tests,  general  intelligence  tests,  and  clerical 
aptitude  tests—the  average  amount  of  variance  accounted  for  is  84 
percent.     If  we  could  correct  for  all  seven  artifactual  sources  of 
variance — instead  of  just  four— we  conclude  all  observed  variance 
would  be  accounted  for. 

Thus  the  evidence  is  strong  that  the  doctrine  of  situational  spec- 
ificity is  false  and  employment  test  validities  can  be  generalized 
across  settings. 

Although  not  shown  in  Table  4,  our  method  also  produces  estimates 
of  the  true  validities  that  should  be  generalized.    These  are  produced 
by  correcting  the  mean  observed  validity  for  range  restriction  and 
criterion  unreliability  using  average  values  of  both.    For  the  true 
constructs  in  Table  4,  these  validities  range,  with  one  exception, 
from  .37  to  .70.    The  average  value  is  .47.    Thus  tests  of  these  kinds 
have  generalizable  and  substantial  validity  for  predicting  proficiency 
in  clerical  work. 

We  believe  that  application  of  this  model  may  lead  to  fairly 
dramatic  progress  in  the  establishment  of  general  principles  and 
theories  about  trait-performance  relationships  in  the  world  of  work. 
The  first  step  in  the  development  of  general  principles  and  theories 
in  this  or  any  other  area  is  the  establishment  of  stable  patterns  of 
relationships  among  basic  variables.    In  order  to  establish  such  pat- 
terns of  relationships,  it  is  first  necessary  to  demonstrate  that  the 
doctrine  of  situational  specificity  is  false  or  essentially  false. 
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If  the  situational  specificity  hypothesis  is  rejected,  then  it 
follows  that  various  constructs— for  example,  verbal  ability- 
have  invariant  population  relationships  with  specified  kinds  of 
performances  and  Job  behaviors.    The  best  estimate  of  this  popula- 
tion value  for  any  construct-performance  combination  is  the  fully 
corrected  mean  of  the  validity  distribution.    This  mean  should  be  cor- 
rected for  unreliability  in  both  test  and  criterion,  since  the  goal 
in  theoretical  research  is  to  reveal  relation^hlrps  among  underlying 
constructs,  independent  of  measurement  problems.    We  predict  that 
such  research  will  reveal  that  the  underlying  structure  of  reality 
in  personnel  psychology— that  is,  the  pattern  of  population  param- 
eters and  their  relationships— is  considerably  simpler  than  has 
previously  been  Imagined  (Schmidt  &  Hunter,  1978).    The  model  pre- 
sented here  thus  provides  a  tool  which  should  enable  the  field  to 
move  beyond  a  mere  technology  to  the  status  of  a  science. 
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Table  1 

Example  of  Assumed  Distribution  of  Criterion 
Reliabilities  Across  Studies 
(Proficiency  Measures) 


Reliability  Relative  Frequency 


.90 

3 

.85 

4 

.80 

6 

.75 

8 

.70 

10 

.65 

12 

.60 

14 

.55 

12 

.50 

10 

.45 

8 

.40 

6 

.35 

4 

.30 

3 

Note.    Expected  value  (criterion  reliability)  ■  .60. 
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Table  2 

Example  of  Assumed  Distribution  of 
Test  Reliabilities  Across  Studies 


Reliability 

Relative  Frequency 

.90 

15 

AC 

•»  ft 

.80 

25 

.75 

20 

.70 

4 

.60 

4 

.50 

2 

Note.    Expected  value  (test  reliability)  -  .80. 
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Tfible  3 

Example  of  Assumed  Distribution  of  Range  Restriction 
Effects  Across  Studies 


Prior  Selection  Ratio 

SD  of  Test 

Relative  Frequency 

1.00 

10.00 

5 

.70 

7.01 

«  « 

1.1 

.60 

6.49 

16 

.50 

6.03 

18 

.40 

5.59 

18 

.30 

5.15 

16 

.20 

4.68 

11 

.10 

4.11 

5 

Note.    Expected  value  (SD)  "6,0. 
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Tabic  4 

Observed  and  Predicted  Standard  Deviations  and  Percent  Variance  Accounted  For 
(Clerical  Job  Families-Proficiency  Criteria) 


i 


Job  Family 


Test  Type 


Total 


No.  of    Obs.  Pred. 
r's      SD*  SD^ 


Stsno.  Typing,  fc  Filing 
Computing  &  Account  Recording 

St«no,  T>T)lng,  *  Filing 
Computing  *  Account  Recording 
Material  &  Production  Recording 


Steno,  Typing,  &  Filing 
Computing  &  Account  Recording 
Material  &  Production  Recording 
Information  &  Message  Distribution 
Public  Contact 

Sceno,  Typing,  &  Filing 
Computing  '  Account  Recording 
Material  &  Production  Recording 
Information  &  Message  Distribution 
Public  Contact 

Steno,  Typing,  &  Filing 
Computing  i  Account  Recording 
Material  &  Production  Recording 

Steno,  Typing,  i  Filing 
Computing  i  Account  Recording 
Material  &  Production  Recording 

Steno,  Typing,  &  Filing 
Computing  i  Account  Recording 
Material  &  Production  Recording 

Steno,  Typing,  i  Filing 
Computing  &  Account  Recording 
Material  &  Production  Recording 
Information  &  Message  Distribution 

Steno,  Typing,  &  Filing 
Computing  &  Account  Recording 

Steno,  Typing,  &  Filing 
Computing  &  Account  Recording 


General  Intelligence 
General  Intelligence 

Verbal  Ability 
Verbal  Ability 
Verbal  Ability 
Verbal  Ability 

Quantitative  Ability 
Quantitative  Ability 
Quantitative  Ability 
Quantitative  Ability 
Quantitative  Ability 

Perceptual  Speed 

Perceptual  Speed 

Perceptual  Speed 

Perceptual  Speed 

Perceptual  Speed 

Reasoning  Ability 
Reasoning  Ability 
Reasoning  Ability 

Memory 
Memory 
Memory 

Spatlal/Mech'l.  Abll. 
Spatlal/Mech'l.  Abll. 
Spatlal/Mech'l.  Abll. 

Motor  Ability^ 

Motor  Ability^ 

Motor  Ability^ 

Motor  Ability'* 

Performance  Teiti 
Performance  Teiti 


Clerical  Apt. 
Clerical  Apt, 


Teiti^ 
Teitac 


3,986 
5,433 

16,176 
8,670 
1,926 
1,073 

12,368 
10,631 
1,641 
1,110 
993 

23,045 
22,978 
3,574 
2,002 
1,151 

3,497 
1,556 
1.114 

2,471 
1,817 
1.086 

2,604 
5,265 
811 

4,045 
11,948 
1,968 
1,370 

3,663 
1.427 

3.915 
1,645 


65 
58 

175 
110 
45 
14 

130 
140 
39 
15 
13 

269 
321 
64 
27 
16 

36 
27 
22 

36 
33 
22 

21 
57 
18 

54 
131 
27 
19 

39 
13 

53 
25 


.266 
.181 

.179 
.160 
.155 
.165 

.148 
.171 
.195 
.136 
.064 

.190 
.168 
.145 
.168 
.126 

.134 
.205 
.181 

.169 
.154 
.154 

.112 
.150 
.160 

.172 
.132 
.131 
.219 

.348 
.178 

.235 
.217 


.174 
.135 

.130 
.132 
.178 
.147 

.138 
.149 
.201 
.143 
.144 

.139 
.151 
.163 
.156 
.137 

.123 
.169 
.168 

.147 
.156 
.175 

.097 
.121 
.184 

.129 
.117 
.133 
.147 

.164 
.122 

.165 
.161 


X  Var. 
Ac c. For 


43 
56 

53 
53 
100 
80 

87 
76 
100 
100 
100 

54 

81 
100 

87 
100 

84 
68 
86 

76 
100 
100 

76 
65 
100 

56 
78 
100 
45 

22 
47 

49 

55 


i 


•In  Fliher'i  z  form. 

^Dotting,  tapping,  ate.  testa;  also  some  manual  and  arm  dexterity  teste. 
CTesta  comprised  of  verbal,  quantitative,  and  perceptual  speed  components. 
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Appendix  A 


Sources  of  Variance  in  Distributions  of 
Validity  Coefficients  for  a  Given 
Test  Type  •  Job  Combination 

Error  variance  due  to  differences  between  studies  in 
criterion  reliability. 

Error  variance  due  to  differences  between  studies  in 
test  reliability. 

Error  variance  due  to  differences  between  studies  in 
degree  of  range  restriction. 

Error  variance  due  to  sampling  error,  i.e.,  variance 
due  to  use  of  N  <  ^. 

Error  variance  due  to  differences  between  studies  in 
amount  and  kind  of  criterion  contamination  and  deficiency 
(Brogden  and  Taylor,  1950). 

Error  variance  due  to  computational,  typographical,  etc.t 
errors  (Wollns,  1962). 

Error  variance  due  to  slight  differences  In  factor  struc* 
ture  of  tests  measuring  the  same  construct. 

Variance  due  to  true  differences  In  factor  structure 
between  criterion  measures t  I.e.,  variance  due  to  tjfue 
situational  specificity. 
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Appendix  A  (cont'd.) 


Our  hypothesis  is:         «  0.    An  alternative  statement  of  this 
hypothesis  is: 

\lt.i  -  "A  -  "4  ■  "A  -    ■  "A  ■  "h  -  °A  '  ° 

I.    Computing  variance  due  to  differences  between  studies  in  criterion 
reliability. 

1.  Compute  mean  of  the  raw  validity  distribution  in  Fisher's  2 
(F^)  form  and  convert  to  r. 

2.  Correct  this  raw  r  for  test  and  criterion  unreliability  and 
range  restriction  using  average  values  across  studies  for 
these  three  variables.    (In  this  study,  average  assumed  cri- 
terion reliability  was  .60,  average  assumed  test  reliability 
was  .80,  and  average  assumed  range  restriction  was  to  a  SD 
of  6.0  from  an  unrestricted  SD  of  10.0;  see  Tables  1,  2,  and 
3  In  text.)   This  provides  an  estimate  of  the  fully  corrected 
validity  r^. 

3.  For  each  value  of  assumed  criterion  reliability,  r^^^;  compute 
^mm  ^^oci       convert  this  attenuated  r  to 

4.  Compute  ^^^^i    •n^  i  d  l^^i    '"t»  where      »  the  relative 
frequencies  of  the  criterion  reliabilities. 


Appendix  A  (cont'd.) 

5.   Variance  In  Fz  distribution  due  to  criterion  reliability 
differences  of  validities  Is  then: 


Computing  variance  due  to  differences  between  studies  In  test 
reliability. 

1.  Compute  mean  of  the  raw  validity  distribution  In  Fz  form 
and  convert  to  r. 

2.  Correct  this  raw  r  for  range  restriction  and  for  attenuation 
due  to  test  unreliability  (using  average  values  of  both)  but 
not  for  attenuation  due  to  criterion  unreliability.    Let  this 
resulting  coefficient  be  symbolized  r^. 

3.  For  each  value  of  assumed  test  reliability,  r_.t  compute 

^^xxi  «nd  convert  this  attenuated  r  to  Fz. 

4.  Compute  ^Fz^   ^rti  and  ^yz\    "fti$  where  ni  ■  the  relative 
frequencies  of  the  criterion  reliabilities. 

5.  Variance  In  Fz  distribution  due  to  differences  In  test  relia* 
binty  is  then: 


00 


2    .   T/H  'H 
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Appendix  A  (cont'd.) 

Computing  variance  due  to  range  restriction  differences  between 
studies: 

1.  Compute  mean  of  the  validity  distribution  In  Ft  form  and 
convert  to  r.    Correct  this  raw  r  for  mean  range  restriction 
but  not  for  attenuation  due  to  either  source  of  unreliability. 

2.  For  each  value  of  the  restricted  standard  deviation,  use  the 

following  formula  to  compute  the  expected  restricted  r: 

uiB   

J?2  +  1 

where: 

r.    =   the  restricted  validity 
^ 

B  '   the  unrestricted  validity 

=  edi/SD 

SD  =    the  standard  deviation  of  the  test  in  the 

unrestricted  group 
ad-   =   the  standard  deviation  of  the  test  in  the 
restricted  group 
This  formula  Is  obtained  by  solving  Thorndike's  (1949,  p.  173) 
Case  II  f  omul  a  for  r^.    (Thorndike's  Case  II  i5  th-  model 
throughout  these  analyses:  use  of  Case  III  would  generally 
produce  very  similar  results.) 

_  2 

3.  Convert  r,-  to  Fz  and  compute  .n^  and  2JH  'H* 

4.  Variance  due  to  range  restriction  differences  between  studies 
Is  then: 
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Appendix  B 
Teot  Classification  System  and  Code 


General  Mental  Ability  (10) 

10  ■  Intelligence/adaptability 
Verbal  Ability  (11-17) 

11  -  verbal  ability,  nfc^ 

12  ■  reading  comprehension 
13 
ll 
15 
16 
17 


grammar 
spelling 
vord  fluency 
sentence  completion 


Quantitative  Ability  (20-28) 

20  "  quantitative  ability,  ufc 

21  ■  computation  (mixed  operations) 

22  ■  arithmetic  vord  problems 

23  ■  error  location 

24  ■  computation  (addition) 

95  ■  computation  (subtraction) 
■6  "  computation  (multiplication) 
PI?  ■  computation  (division) 

28  ■  graph  and  table  reading 

Reasoning  Ability  (30-36) 

30  ■  reasoning  ability,  nfc 

31  *  verbal  reasoning  (analogies,  inference) 

32  ■  abstract  reasoning  (figure  analogies) 

33  "  logical  order  of  events 

34  ■  letter  series 

35  ■  number  series 

36  ■  Judgment 

Perceptual  Speed  (40-49) 


Memory  (50-56) 

50  ■  memory,  nfc 

51  ■  memory  of  oral  Instructions 

52  ■  classification 

53  ■  coding 

54  ■  substitution  (letter-digit  or 

digit-symbol) 

55  "  number  witing 
55  ■  immediate  memory 

Spatial  and  Mechanical  Ability  (60-65) 

60  ■  spatial  or  mechanical  ability,  nfc 

61  ■  mechanical  knowledge 

62  ■  spatial  relations 

63  ■  location 

64  *  mechanical  principles 

65  ■  pursuit 

Motor  Ability  (70-78) 


70 

motor  ability,  nfc 

71 

finger  dexterity 

72 

hand  dexterity 

73 

arm  desterlty 

7A 

tracing 

75 

tapping 

76 

dotting 

77 

mark  making 

78 

aiming 

Performance  Tests  (80-83) 

80  ■  performance  tests,  nfc 

81  ■  typing  test 

82  -  dictation  test 

83  ■  work  sample 


40 
41 
42 
43 
44 
45 
46 
47 
48 

I' 


Not 


perceptual  speed »  nfc 
name  comparison/checking 
number  comparison/checking 
figure  comparison 
cancellation 
filing  (numbers) 
name  and  number  comparison/checking 
coding 

alphabetizing  or  name  filing 

substitution  (letter-digit  or  digit-symbol) 

further  classifiable  or  combination  of  item  types  within  same  test  type 


Clerical  Aptitude  Tests  (90) 

90  ■  clerical  aptitude  (combined 

verbal,  numerical,  and  clerical 
speed) 
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Appendix  C 

Job  Classification  System  and  Code 
(full  D.O.T.  code  in  parentheses) 

Stenography.  Typing.  Filing,  and  Related  Occupations 

201  -  Secretaries  (201.368) 

202  -  Stenographers  (202.388) 

203  -Typists  (203.588) 

204  ■  Correspondence  clerks  (204.288) 

205  -  Personnel  clerks  (205.368) 

206  -  File  clerks  (206.388) 

9ni  _  rwini ■ii>a«--lno-nachlne  ooerators  (207.782) 

208  -  Sscenaneous'office  Machine  operators  (208.138,  208.588,  208.782.  and  208.885) 

209  -  Stenography,  typing,  filing,  and  related  occupations,  n.e.c.l  and  mixed  samples^ 

260  -  Clerk  (includes  office  clerk,  general  clerk,  junior  clerk,  entry-  and 

intermediate-level  clerk)  (209.588) 

261  -  Clerk-typist  (209.388) 

262  -  Index  clerk  (209.588) 

263  -  Combined  samples  of  clerks,  typists,  stenographers,  and  secretaries 

264  -  Copy  holder  (209.588)  and/or  proofreader  (209.688) 

265  -  Pricing  clerk  (209.588) 

266  -  Checker  II  (209.688) 

Computing  and  Account-Recording  Occupations 
Bookkeepers  (210.388) 

Cashiers  (211.368  and  211.468)  ^ 
Tellers  (212.368)  -  eqq 

Automatic  data-processing  equipment  operators  (213.382,  21J.5B/,  ziJ.aoo, 

213.782,  and  213.885) 
Billing-machine  operators  (214.488) 
Bookkeeping-machine  operators  (215.388) 
Computing-machine  operators  (216.488) 
Account-recording-machine  operators  (217.388) 

Computing  and  account-recording  occupations,  n.e.c.  and  mixed  samples 
General  office  clerk  (includes  senior  clerk  and  administrative  clerk)  (219.388) 
Ward  clerk  (219.388) 
Hand  transcriber  (219.588) 

Toll-bill  clerk  (includes  invoice  typist)  (219.388) 
Budget/fiscal  clerk  (219.388) 
Actuarial  clerk  (insurance)  (219.388) 
Accounting  clerk  (219.488) 
Coding  clerk  (219.388) 

Combined  samples  of  computing  and  account-recording  machine  operators 
Combined  samples  of  bookkeeping,  accounting,  fiscal,  and  auditing  clerks 


210 
211 
212 
213 

214 
215 
216 
217 
219 
270 
271 
272 
273 
274 
275 
276 
277 
278 
279 


S?^t^niJ^'n:'t.o  or  more  different  Jo.  codes  fr«  ,h.  Jot  l»lly 
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Appendix  C  (cont'd.) 
^^iaterial  and  Production  Recording  Occupations 

221  •  Production  clerks  (221.168  and  221.388) 

222  -  Shipping  and  receiving  clerks  (222.138,  222.387,  222.587,  and  222.687) 

223  "  Stock  clerks  and  related  occupations  (223.387) 

224  «  Weighers  (224.487) 

229  ■  Material  and  production  recording  occupations*  n.e.c.  and  mixed  samples 
Information  and  Message  Distribution  Occupations 

230  -  Messengers,  errand  boys,  and  office  boys  and  girls  (230.368,  230.868,  and  230.878) 

231  -  Mail  clerks  (231-588) 

232  -  Post  office  clerks  (232.368) 

233  -  Mail  carriers  (233.388) 

234  ■  Mail-preparing-  and  mail-handling-machine  operators  (234.582  and  234.885) 

235  ■  Telephone  operators  (235.862) 

236  "  Telegraph  operators  (236.588) 

237  -  Receptionists  and  information  clerks  (237.368) 

239  ■  Information  and  message  distribution  occupations,  n.e.c.  and  mixed  saisples 
Public  Contact  Occupations 

240  -  Collectors  (240.368) 

241  -  Adjusters  (241.168  and  241.368) 
^42  -  Hotel  clerks  (242.368) 

^K3  -  Direct  service  clerks  (243.368) 

Miscellaneous  Clerical  Occupations 

280  «  Ei)umerator/survey  worker  (249  268) 

281  >  Library  assistant  (249.368) 

282  -  Order  clerk  (249.368) 

283  -  Telehone  ad-taker  (249.368) 

284  -  Securities  clerk  (249.368) 

285  -  Engineering  clerk  (249.388) 

286  ■  Service  representative  (includes  contract  clerk)  (249.368) 

287  «  Claims  examiner  (249.268) 

Additional  Categories 

250  "  All  other  clerical  occupations  not  otherwise  clastsiriable  or  not  specified 

251  ■  Samples  which  represent  two  or  more  different  job  codes  from  different  Job 

families 


i 
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Appendix  D 


Number  of  Validity  Coefficients  in  Clerical  Validity 
Data  File  by  Test  Type  and  Job  Fanily 
(Proficiency  Criteria) 


Job  Family 


Test  Type 

20 

21 

22 

23 

24 

28 

25 

26 

Total 

Generail  Intelligence 

65 

58 

9 

6 

6 

14 

28 

4 

190 

Verbal  Ability 

175 

110 

45 

14 

4 

19 

60 

8 

435 

Quantitative  Ability 

130 

140 

39 

15 

13 

21 

76 

11 

445 

Reasoning  Ability 

36 

27 

22 

0 

3 

6 

21 

0 

115 

Perceptual  Speed 

269 

321 

64 

27 

16 

35 

108 

18 

858 

Memory 

36 

33 

22 

2 

3 

7 

4 

2 

109 

Spatial /Mechanical  Ability 

21 

57 

18 

6 

3 

5 

0 

1 

111 

Motor  Ability 

54 

131 

27 

19 

11 

13 

6 

4 

265 

Performance  Tests 

39 

15 

0 

0 

0 

0 

1 

2 

57 

Clerical  Aptitude  Tests 

53 

25 

9 

3 

0 

3 

37 

3 

133 

Total 

725 

856 

159 

89 

59 

123 

341 

53 

2.71^ 

Job  family  codes  defined: 

20  -  Stenography,  Typing,  Filing,  and  Related  Clerical 

21  *  Computing  and  Account-Recording  Clerical 

22  ■  Material  and  Production-Recording  Clerical 

23  "  Information  and  Message  Distribution  Clerical 

24  -  Public  Contact  Clerical 

28  -  Miscellaneous  Clerical  (D.O.T.  Group  249  Jobs) 

25  ■  Unspecified  Clerical 

26  "  Mixed  Samples 
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Synthetic  Validity 
Marvin  H.  Trattner 


History 

Guion  (1965)  defines  synthetic  validity  as  the  inference  of  validity 
fron  the  predetermined  validities  of  a  test  for  specific  conponents  of  a 
job.    Guion's  6?^)roach  allows  one  to  infer  test  validity  for  an  occupa- 
tion when  no  test  or  criterion  data  are  collected  for  the  specific 
occupatiOT.    The  approach  to  be  described  here  enlarges  upon  Guion's 
definition  by  permitting  under  certain  conditions  the  calculation  of  the 
validity  coefficient  vrfien  no  test  and  criterion  data  have  been  collected 
for  an  occupation.    With  the  use  of  this  approach  test  validities  can  be 
calctdated  for  occupations  where  it  is  infeasible  for  a  variety  of 
reasons  to  conduct  traditional  studies.    The  approach  to  be  described 
is  an  application  of  Ernest  Primoff 's  J-coeff icient.    It  is  also  based 
on  Vern  Urry's  recent  extensions  of  the  J-coefficient  formula. 


Description 

Uie  following  are  the  steps  in  applying  the  synthetic  validity 
paradigm. 

1.  Select  the  class  of  occupations  for  v^ich  the  test  will  be  used. 
Fdr  the  class,  select  the  most  populous  occupations.    The  class  should 
consist  of  occupations  in  v*iidi  similar  tasks  are  performed  at  proxi- 
mately the  same  difficulty  level.    For  instance  for  the  clerical  class 
select  Clerk  Typist,  Secretary,  File  Clerk,  Receptionist,  Typist,  etc. 

2.  Define  the  major  job  duties  for  the  occupational  class.    A  duty  is 
defined  as  a  major  segment  or  conponent  or  module  of  work  performed  in  an 
occupation.    It  could  be  the  only  work  performed  in  a  specific  subtype  of 
the  occupation.    Itie  same  duty  may  occur  in  several  of  the  different 
occupations  in  the  class.    The  following  are  good  exanples  of  clerical 
duties:    take  dictation,  carpose  routine  correspondence,  type  simple 
material,  type  technical  material.    The  job  duty  is  conceptually  similar 
to  the  "work  behaviors"  defined  in  the  new  Uniform  Guidelines  on 
EStployee  Selection  Procedures. 

3.  Determine  the  test  validity  for  measuring  duty  performance  for 
several  occupations  in  the  class.    Correlate  the  test  score  with  duty 
performance  measures  separately  for  the  most  populous  occupations  in  the 
class. 

4.  Calculate  the  test's  synthetic  validity  coefficient  for  a  spe- 
cific occupation.    The  synthetic  validity  coefficient  is  the  correlation 
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of  the  weighted  sum  of  the  duty  performance  scores  with  the  test  score 
after  the  duty  scores  are  weighted  for  inportance  for  the  specific  occu- 
pation.   Another  way  to  precisely  calculate  the  test's  synthetic  valiriity 
is  to  weight  the  individual  test  by  duty  validity  coefficients  for  dt  / 
inportance  and  sun  the  weighted  validity  coefficients  with  the  use  of  the 
the  correlation  of  weighted  surris  formula.    The  formula  gives  the  correla- 
tion of  the  test  score  with  the  sum  of  the  weighted  duty  performance  scores. 
Once  stable  test  by  duty  validity  coefficients  and  duty  intercorrelation 
coefficients  are  obtained  for  the  occupations  in  a  class  they  can  be  used 
to  estimate  the  test's  validity  for  any  occupation  in  the  class.    The  only 
additicxial  data  required  to  obtain  the  validity  estimate  are  ratings  of  duty 
inportance  for  success  in  the  occupation.    The  first  level  supervisors 
are  eniployed  to  rate  the  duties  for  iirportance  for  occupational  success. 

If  all  occupational  duties  are  defined  and  precise  estimates  of  test 
validities  for  the  duties  are  obtained  then  the  synthetic  validity  coeffi- 
cient is  precisely  equivalent  to  the  actual  test  validity  coefficient. 
It  is  assumed  that  the  test  correlation  with  duty  ^rformance  is  constant 
across  occupations.    If  all  major  occupational  duties  are  not  defined 
then  the  synthetic  validity  coefficient  is  a  lower  bound  for  the  actual 
validity  coefficient.    If  the  duty  performance  scores  correlate  incon- 
sistently with  each  other  and  with  the  t6^;t  across  occupations  in  the 
class  then  the  synthetic  validity  coefficient  cannot  be  estimated. 


Method 

In  order  for  the  research  to  succeed,  two  major  problems  will  need  to 
be  overcome.    It  will  be  necessary  to  define  a  coirprehensive  set  of  duties 
that  describes  the  work  performed  in  the  occupational  class.    Where  the  same 
duty  is  performed  in  different  occupations  it  should  be  performed  at  the 
sane  level  of  difficulty  and  consist  of  very  similar  tasks.    The  other 
difficulty  is  that  the  validity  coefficients  for  the  test  for  measuring 
the  duties  must  be  consistent  and  somewhat  significant  across  occupations. 

The  method  to  be  described  should  achieve  the  desired  results. 

1,    Assenble  subject  matter  experts  (SMEs)  to  define  the  duties  for 
the  class.    The  SMEs  v^ould  be  senior  journeymen  and  first  level  super- 
visors  in  the  occupations.    First  ask  the  SMEs  to  define  the  duties  in 
their  own  occupation.    Then  ask  the  assenbled  SMEs  to  generalize  duties 
across  occupations.    A  generalized  duty  should  define  the  same  work  tasks 
at  the  same  level  of  difficulty  occurring  in  different  occupations.  The 
subject  matter  specific  to  an  occupation  should  be  omitted  if  it  is 
unrelated  to  the  duty  difficulty  level.    For  instance,  the  subject  matter 
of  the  tedinical  material  that  is  typed  would  probably  be  irrelevant  in 
determining  an  aptitude  test's  validity  for  measuring  skill  in  typing 
technical  material.    Consequently  reference  to  the  technical  material 


should  be  omitted  from  the  duty  definition.    VJhere  the  same  duty  occurs 
at  different  levels  of  difficulty  then  the  duty  should  be  split  into 
several  vrfiich  describe  the  differing  difficulty  levels. 

2.    Determine  the  test  validity  for  measuring  duty  performance  for 
the  class  of  occupations. 

It  will  be  necessary  to  correlate  test  scCxes  with  duty  performance 
scores  for  incumbents  in  the  p<^)ulous  occupaticns  in  the  class.  Since 
there  may  be  as  many  as  fifty  defined  duties,  r.K>st  of  v^ich  would  not 
aH?ly  to  any  one  occijpation,  the  only  feasible  way  to  measure  duty 
performance  would  appear  to  be  with  the  use  of  a  rating  of  duty  perfor- 
mance.   It  would  be  prohibitively  e3?)ensive  to  construct  work  sanples 
for  fifty  duties.    Furthermore,  it  would  be  necessary  that  the  work 
sanples  have  subject  matter  content  that  would  be  equally  familiar  to 
all  research  participants.    Work  sanples  with  neutral  subject  matter 
content  in  many  cases  would  closely  resemble  the  aptitude  test  for 
%*iich   they  were  designed  as  criteria.    These  kinds  of  work  sanples 
mi^t  not  be  scientifically  or  legally  defensible. 

We  are  all  aware  that  ratings  are  a  very  questionable  kind  of 
performance  measure.    When  enployed  as  criteria  they  are  less  likely 
to  be  significantly  correlated  with  selection  instruments  than  other 
kinds  of  job  performance  measures.    They  are  used  here  not  for  the 
sake  of  convenience  but  out  of  necessity.    The  following  are  some  of 
the  steps  we  will  take  to  maximize  the  probability  of  success  for 
the  project. 

1.  Elrploy  a  large  N  for  each  occupation. 

2.  Select  occupations  for  study  with  very  specific  performance 
standards.    Uiese  would  tend  to  be  production  oriented  occupations. 

3.  Use  research  participants  at  grade  levels  below  the  journeyman. 

4.  Identify  research  participants  only  by  a  code  nunber.    In  this 
way  we  hope  to  encourage  more  candid  and  hence  more  valid  ratings. 

5.  Obtain  performance  ratings  from  the  first  level  supervisors 

and  the  research  participants  themselves.    CcMDbine  the  two  sets  of  ratings 
to  obtain  the  performance  measure.    The  assurption  to  be  tested  is  that 
research  participants  are  best  qualified  to  evaluate  their  relative 
performance  on  the  duties  and  the  first  level  supervisors  are  best  quali- 
fied to  evaluate  the  research  participants'  overall  performance  level. 

6.  Carefully  scale  rating  forms. 

7.  Use  inpossible  end  points  to  eliminate  raters  who  use  them. 
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8.  Train  raters  and  involve  them  in  the  construction  of  the 
rating  forms. 

9.  Get  reliability  estimates  for  the  ratings  by  coirparing 
ratings  given  by  present  and  former  first  level  supervisors  of  the 
research  participants. 

Hie  synthetic  validity  paradigm  can  be  applied  to  test  selection 
with  multiple  regression.    The  validity  coefficient  for  each  test  can 
be  synthetically  calculated  and  onployed  along  with  the  test  inter - 
correlations  to  select  and  weight  tests  in  a  battery. 

If  a  consistent  matrix  of  significant  test  validity  coefficients 
for  duties  can  be  developed  for  a  class  of  occupations  it  is  probable 
that  the  matrix  would  be" af^licable  across  agencies.    It  is  probably 
true  that  variance  due  to  duty  performance  in  different  occupations 
ought  to  be  much  greater  than  variance  due  to  employer.    A  test  that 
correlates  with  specific  duty  performance  for  one  errployer  should 
correlate  the  same  way  for  another.    It  follows  that  private  industry 
employers,  state  and  local  governments,  and  Federal  agencies  could 
each  profitable  pool  their  research  and  developnent  efforts  in 
constructing  synthetically  validated  test  batteries. 
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A  Primer  of  Item  Response  Theory 


(an  overview  of  a  book  by  the  same  title*) 

Thomas  A.  Warm 
U.S.  Coast  Guard  Institute 
Oklahoma  City,  Oklahoma 


As  I  look  out  over  my  audience  here,  I  see  several  people  who  really  ought 
to  bo  up  here  instead  of  me.  Tm  not  an  expert  in  the  subject  of  Item  Re'.ponse 
Theory.  Less  than  two  years  ago  I  had  not  even  heard  of  item  response  theory. 
I  dir»covered  its  existence  in  January  of  last  year  while  thumbing  through  some 
journals.  During  the  next  several  months  I  spent  several  hundred  hours  trying 
to  understand  it.  It  wasn't  until  last  year's  MTA  when  I  was  able  to  pick  the 
brains  of  several  people,  that  it  all  finally  fell  into  place. 

Soon  thereafter  it  occurred  to  me  that  Item  Response  Theory  really  need  not 
have  been  all  that  complicated,  if  someone  had  just  sat  down  with  me,  and  explained 
it  in  simple  language  and  with  a  few  simple  examples. 

The  thought  that  all  ttj>*  v;ork  could  have  been  unnecessary  disturbed  me  to 
the  point  that  I  dec-:.^*:'  that  no  one  ought  to  ha/e  to  go  through  what  I  went 
through  to  learn  abcu^  ly^^-^t  I  consider  to  be  the  most  important  development 
in  the  history  of  testing. 

With  that  idea  in  mind  I  wrote  this  book.    I  simply  put  into  it  everything 
that  I  wish  someone  had  told  me  a  year  and  a  half  ago.    What  I  intend  to  do  today 
is  n)(?rely  to  introduce  some  of  the  basic  concepts  of  IRT.    Then,  if  you're 
interested,  you  can  get  the  rest  of  the  theory  from  the  book,  hopefully. 

I/:em  Response  Theory  (abbreviated  IRT)  deals  with  multiple-choice  questions 
on  an  ability  test.    But  when  I  say  "ability"  I  do  not  mean  only  the  so-called 
"pun;"  abilities  in  testing,  such  as  verbal  ability,  numerical  ability,  and  spatial 
ability.    I  also  mean  job  knowledge  tests,  and  subject  matter  tests.    IRT  applies 
to  all  of  these  types  of  tests.    It  may  also  be  applicable  to  personality  testing, 
but  very  little  work  has  been  done  on  this  application.    It  applies  very  well  to 
free  response  (fill  in)  questions  in  addition  to  multiple-choice  items. 

Lot's  say  we  take  a  group  of  people  with  a  wide  range  on  some  ability,  say 
arithmetic.  And  let's  say  we  give  two  arithmetic  tests  to  this  group,  one  of 
the  tests  is  easy  and  the  other  is  hard. 

Then  we  will  find  these  two  distributions  for  the  two  tests. 


^Copies  of  this  book  may  be  obtained  from  the  National  Technical  Information 
Service,  U.S.  Dept.  of  Commerce,  Springfield,  VA  22161  by  sending  $8.00  for 
papercopy  or  $3.00  for  microfiche.    Use  item  #  AD-A063072. 
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The  easy  test  will  be  skewed  to  the  left  because  most  people  will  score  high, 
since  it  is  an  easy  test.    The  hard  test  will  bo  skewed  to  the  right,  because 
most  people  will  score  low,  since  it  is  a  hard  test. 

In  general  we  will  find  that  those  v/ho  score  high  on  one  test  will  also 
score         on  the  other  test.    And  those  who  score  low  on  one  test  will  generally 

Kn:  lo\  on  the  other  test.    And  those  who  are  at  the  median  on  the  hard  test 
(1  in  general  be  at  the  median  on  the  easy  test.    In  other  words,  we  find  consist-^ 
y  in  the  performance  of  the  examinees  on  the  two  tests  of  the  same  ability.  That's 
not  «  very  earthshaking  observation.    If  we  didn't  find  that  consistency,  we  would  not 
be  in  the  testing  business. 

To  explain  this  consistency  we  assume  there  is  something  about  the  examinees 
that  causes  them  to  score  consistently  relative  to  each  other.    We  call  that 
something  a  mental  trait.    No  one  has  ever  seen  a  mental  trait  and  no  one  really 
expects  to.    Since  there  is  no  known  physical  referent  for  a  mental  trait,  it 
Is  called  a  "latent"  trait. 

The  branch  of  psychometrics  that  deals  with  this  latent  trait  is  called 
"latent  trait  theory". 

There  are  several  different  models  within  latent  trait  theory.    The  models 
are  jjenerally  distinguished  by  the  number  of  parameters  in  the  model. 


There  is  the   1-parameter  model,  also  known  as  the  Rasch  model.  (I'll 
explain  later  what  the  parameters  are. ) 

There  are  2-parameter  models.    There  are  three  of  these.    The  a-b  model  and 
the  b-c  model,  which  were  explored  by  Urry  in  1970.    And  there  is  a  2-parameter 
polynomial  model  on  which  Samejima  at  the  University  of  Tennessee  is  working. 

fJhe  3-parameter  model  is  called  Item  Response  Theory,  which  is  the  subject 
this  book.    IRT  was  first  presented  by  Fred  Lord  in  his  1952  Ph.D.  dissertation, 
was  called  Item  Characteristic  Curve  Theory  until  1977,  when  Fred  Lord  renamed 
It  I':en  Response  Theory. 
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rm  told  the  Germans  have  an  n -parameter  model  which  means  there  is  no  limit 
on  their  number  of  parameters.    But  I  know  nothing  about  their  model. 

In  general,  there  has  grown  a  consensus  that  the  3-parameter  model  best 
describes  reality.    Most  of  the  work  in  latent  trait  theory  is  now  concentrating 
on  the  3-parapieter  model. 

How  back  to  the  latent  trait  itself. 

The  scale  of  the  latent  trait  is  traditionally  given  the  name  of  the  Greek 
lettejr  theta  (0).    I  will  use  the  terms  theta,  ability  level,  amount  of  trait, 
and  amount  of  subject-natter-knowledge,  interchangeably.    Theta  is  a  continuum 
froifl  Kiinus  infinity  (-op)  to  plus  infinity  (+«^.    It  has  no  natural  zero  point  or 
unit.    Therefore,  the  zero  point  and  unit  are  often  taken  as  the  mean  and  standard 
deviation,  respectively,  of  some  reference  sample  of  examinees.    Thus,  values  of 
0  usually  vary  from  -3  to  +3,  but  may  be  observed  outside  that  range.    The  0s  of 
a  sanple  need  not  be  distributed  normally. 

When  an  examinee  walks  into  a  testing  room,  he  brings  with  him  his  theta. 
The  purpose  of  the  test,  then,  is  to  measure  the  relative  position  of  the  exam- 
ine;es  on  the  theta  scale.    The  test  is  the  measuring  instrument.    The  test 
interprets  the  examinee's  theta  and  produces  a  measurement  of  ability,  which  is 
often  the  raw  (number  right)  score.    Often  measurement  of  an  ability  with  a  test 
is  made  analogous  to  measurement  of  height  with  a  tape  rule.    But  there  is  an 
important  difference.    Height,  whether  measured  by  an  English  rule  or  metric  rule, 
is  always  on  an  equal  interval  scale.    Histograms  of  a  group  of  people  will  always 
look  the  same  except  for  some  linear  stretching  of  a  scale. 

That  is  not  the  case  with  testing.    The  histograms  of  raw  scores  of  the  same 
people  on  two  tests  will  seldom  look  the  same,  even  with  linear  stretching  of 
a  scale.    You  can  see  that  this  is  so  in  Figures  1  and  2.    No  amount  of  linear 
stretching  of  either  scale  will  make  the  two  distributions  look  the  same.    Figure  1 
will  always  be  skewed  to  the  left,  and  Figure  2  will  always  be  skewed  to  the  right. 
That  is  because  each  test  has  its  own  peculiar  scale  (also  called  metric).  The 
peculiarity  of  a  test's  metric  distorts  the  distribution  of  examinees.  Until 
IRT  there  has  been  no  way  to  identify  the  peculiar  scale  of  a  test. 

The  traditional  theory  of  testing  is  Classical  Test  Theory.    Most  testing 
practitioners  use  classical  test  theory,  whether  they  know  it  or  not.    The  basic 
tools  of  inost  testing  practitioners  are: 

a.  p-value  =  proportion  of  examinees  selecting  an  item  alternative  (also 
called  "item  difficulty"), 

b.  d-value  =  point-biserial  correlation  between  the  item  alternative  and 
the  test  (some  use  the  biserial  correlation) (also  called  "item  discrimination"), 

c.  mean  of  examinees'  scores  (number  right), 

d.  standard  deviation  of  examinees'  scores, 

e.  skewness  and  kurtosis  of  examinees'  scores, 

f.  reliability  of  the  test,  usually  KR20,  the  Kuder-Richardson  Formula  20 
(a  special  case  of  Cronbach's  coefficient  alpha). 


Anyone  whose  test  analysis  is  principally  based  on  the  statistics  listed 
^Avo  is  using  classical  test  theory.    The  problem  with  those  statistics  is  that 
^roy  are  relative.    They  are  relative  to  the  distribution  of  ability  among  the 

examinees,  and  they  are  relative  to  the  ^characteristics  to  the  other  items  in  the 

test. 

The  p- value  is  relative  to  the  ability  level  of  the  examinees.    The  same 
item  given  to  a  high  ability  group  and  low  ability  group  will  get  two  different 
p-values  for  the  two  groups.    It  can  be  shown  that  p- values  are  not  true  measures 
of  relative  item  difficulty.    It  is  not  uncommon  for  items  measuring  the  same 
ability  to  reverse  the  order  of  their  p-values  when  given  to  groups  of  different 
average  ability.    For  example,  item  A  may  have  a  higher  p-value  than  item  B  for 
one  (jroup  of  examinees,  but  have  a  lower  p- value  than  item  B  for  a  different  group. 
This  effect  is  not  a  matter  of  sampling  error. 

The  d-value  is  relative  to  the  homogeneity  of  the  ability  levels  of  the 
examinees  in  the  sample,  the  subject  matter  homogeneity  of  the  items  in  the  test, 
and  i:he  dispersion  of  p-values  of  items  in  the  test.    The  same  item,  given  to  a 
group  of  examinees  who  are  similar  in  ability  and  to  another  group  with  a  wide 
range  of  ability,  will  produce  two  different  cWalues  for  the  two  groups.  Similarly, 
an  item  included  in  a  test  with  other  items  that  are  homogeneous  in  content  and 
p-value  will  get  a  d-value  different  from  the  d-value  it  will  receive  in  a  heter- 
ogeneous test. 

The  mean,  standard  deviation,  skewness  and  kurtosis  will  also  vary  according 
to  tlie  characteristics  of  the  test  and  examinees. 

^Vrhe  reliability  is  relative  to  the  standard  deviation  of  the  test,  and  to  the 
Rvalues  and  d-values  of  the  items  in  the  test,  all  of  which  are  dependent  upon 
the  particular  abilities  of  the  examinees  and  the  characteristics  of  the  test. 


It  can  be  shown  that  classical  parameters  (e.g.,  p-value)  will  generally  not 
be  linearly  related  across  subgroups  of  a  population.    This  means  that  the  test 
for  cultural  bias  using  classical  parameters  can  lead  to  an  artifactual  detection 
of  bias. 

Clearly,  classical  test  theory  statistics  are  meaningful  only  in  an  extremely 
linited  situation,  i.e.,  when  the  same  item  is  given  to  the  identical  population 
as  part  of  strictly  parallel  tests.    Such  a  situation  rarely  occurs.  Furthermore, 
the  basic  precepts  and  definitions  of  classical  test  theory  are  untestable,  i.e., 
they  are  tautologies.    They  are  simply  taken  as  true  without  any  way  to  empirically 
determine  their  relevance  to  reality.    Some  are  assumed  to  be  true  even  when  this 
does  not  appear  to  be  warranted.    Thus,  no  one  knows  if  the  classical  test  model 
applies  to  any  real  test. 

In  contrast  IRT  makes  possible  item  and  test  statistics  which  are  dependent 
neither  on  the  characteristics  of  the  examinees  nor  on  the  other  items  in  the 
tast.    They  are  invariant.    With  the  item  statistics  it  becomes  possible  to  describe 
imprecise  terms  the  characteristics  of  the  test  before  the  test  is  administered. 
Imk  capability  allows  one  to  construct  a  test  that  is  highly  efficient  in  accom- 
'l^hing  the  purpose  of  the  test.    It  also  provides  an  extremely  powerful  tool  for 
special  studies,  such  as  item  cultural  bias. 
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Moreover,  the  assumptions  of  IRT  are  explicit  and  have  the  potential  of  empir- 
ical testing.    It  is  possible  to  discover  if  the  data  reasonably  meet  the  assumption 


The  basic  concept  of  IRT  is  the  Item  Response  Function  (IRF) (previously 
callod  the  Item  Characteristic  Curve).    We  define  2  variables: 

9  =  the  ability  scale 

P(R/9)  =  P(9)  =  the  probability  of  getting  the  item  correct,  given  9 

The  IRF  is  an  S-shaped  curve  called  an  ogive  (pronounced  "ojive")  that  gives 
the  relationship  between  9  and  P(9).    See  Figure  3. 


Figure  3.    An  Item  Response  Function. 


Figure  3  should  be  read  like  this: 

A  person  with  the  amount  of  ability  indicated  at  A  has  a  .25  probability  of 
getting  the  item  correct(P(9)  =  .25); 

A  person  with  a  9  at  B  has  a  .40  probability  of  getting  the  item  correct 
(P(9)  =  .40); 

And  a  person  with  a  9  at  C  has  a  P(9)  =  .90. 
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Fiqure  4.  Three  IRFs  (E,  F,  and  G)  with 
b  =  -.5,  0.0,  and  1.0  respectively. 


Every  item  has  its  own  particular  IRF.    Each  IRF  is  defined  by  3  parameters: 
the  a-parameter,  the  b-parameter,  and  the  c-parameter.    Once  these  3  parameters 
an?  known,  you  know  everything  statistically  about  this  item  that  it  is  possible  | 
to  know. 

The  b-parameter,  or  b-va''ue  as  I  will  call  it,  is  the  horizontal  location  of 
the  inflection  point  of  the  IRF.    Look  at  the  lower  left  part  of  the  curve  in 
Fiyure  3;    That  part  of  the  curve  is  concave  upward.    The  top  right  part  of  the  curve 
is  concave  downward.    Somewhere  in  the  middle  of  the  curve,  it  must  change  from 
beinq  concave  upward  to  concave  downward.    That  point  is  called  the  inflection  point. 
The  iior.izontal  location  of  the  inflection  point  on  the  9  scale  is  the  b-value  of 
the  item.    The  b-value  is  the  difficulty  index  of  the  item.    The  larger  the  b-value 
in  the  positive  direction,  the  harder  is  the  item.    The  b-values  of  items  usually 
vary  from  about  -2.5  to  +2.5. 

Figure  4  shows  the  IRFs  of  3  items,  labeled  E,  F,  and  G,  which  are  identical 
except  for  their  b-values,  -.5,  0.0,  and  1.0,  respectively.    You  can  see  that  of 
the  nhree  items  G  (which  has  b  =  1.0)  is  the  hardest  (i.e.,  has  lower  P(6)  for 

any  jjiven  9). 


IRFs  have  2  asymptotes.    The  upper  asymptote  is  always  located  on  the  vertical 
axis  at  1.00.    In  Figure  4  you  can  see  that  the  upper,  right  part  of  the  IRFs  approach 
the  value  of  1.00  on  the  P(9)  axis.    That  is  because  as  ability  increases  so  does 
the  P(9)  up  to  its  maximum  of  1.00.    A  probability  of  1.00  is  a  zjre  thing. 

Tlie  lower  asymptote  of  the  IRF  is  the  c-value.  The  c-value  is  the  probability  . 
that  a  person  of  very  low  ability  will  get  the  item  correct.  J 

Since  we  are  talking  about  multiple-choice  items,  there  is  always  a  finite 
probability  that  the  examinee  will  get  the  item  correct  by  guessing. 

Typically,  we  have  assumed  that  the  chance  probability  of  getting  the  item 
correct  is  1/A,  where  A  =  the  number  of  alternatives  in  the  multiple-choice 
question.    Thus,  we  have  assumed  that  a  four-choice  item  has  a  c  =  1/4  =  .25 
chance  of  being  guessed  correctly,  and  a  5-choice  item  has  a  c  =  1/5  =  .20  chance. 
That  would  be  true,  if  examinees  guessed  truly  randonly.    But,  in  fact,  examinees 
do  not  guess  randomly  when  they  do  not  know  the  answer.    They  guess  according  to 
certain  patterns.    Lord  has  suggested  that  item  writers  are  very  clever  in  writing 
distractors  that  are  attractive  to  low  ability  examinees.    Research  has  shown  that 
when  examinees  do  not  know  the  answer  they  tend  to  guess  the  longest  choice, 
and  to  avoid  choices  with  technical  or  unfamiliar  terms.    Some  examinees  use  a 
rule  of  thumb  to  always  guess  choice  C.    Whatever  the  reason,  examinees  do  not 
guess;  randomly,  and  therefore,  the  c-value  is  seldom  equal  to  1/A.  Typically, 
the  c-value  is  .05  less  than  1/A. 

0 

Most  c-valuGS  range  from  .00  to  .40.    An  item  with  a  c-value  of  .30  or 
greater,  is  not  a  very  good  item.    The  lower  the  c-value  is,  the  better.    A  c  =  .00 
is  ideal . 


Figure  5  shows  the  IRFs  of  3  items,  labeled  H,  J,  and  K,  which 

are  identical  except  for  their  c-values,  .30,  .25,  and  .15,  respectively.  ^ 

You  can  see  that,  although  they  all  have  the  same  b-value,  they  are  of  ■ 

differing  difficulty  for  low  ability  examinees.  ^ 
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Figure  6.    Three  IRFs  (L,  M,  and  N)  with 
b  =  0.0,  c  =  .00,  and  a  =  .3,  .8,  and 
2.0  resoecti vely. 
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The  third  (and  last)  parameter  of  IRT  is  the  a-paraneter,  or  a-value.  The 
a-F)arameter  is  related  to  the  slope  of  the  IRT  at  the  inflection  point  or  in  other 
wordi;  at  the  b-value.    For  the  normal  ogive  model  (with  c  =  .00). 

a  =  "^^rfm^Z.Sm, 

wherii  m  is  the  slope  of  the  ogive  at  the  b-value.    Usually  a-values  vary  from 
.5  to  2.5  with  most  between  1.00  and  2.00.    The  highest  I  have  seen  is  3.76. 

Figure  6  shows  3  IRFs  (L,  M,  and  N),  which  are  identical  except  for  their 
a-values  =  .3,  .8,  and  2.0,  respectively,  with  b  =  0.0  and  c  =  .00.    As  you  can 
see,  the  larger  the  a-value,  the  steeper  the  IRF. 

The  a-value  is  the  discrimination  index  of  the  item.    The  higher  the  a-value 
Is,,  the  more   discriminating  the  item.    The  discriminating  power  of  an  item 
varlos  along  the9-scale.    Where  the  slope  of  the  IRF  is  high  the  item  discriminates 
well.    Where  the  slope  is  low  the  item  discriminates  poorly.    In  Figure  6  item  N 
has  high  slope  from  9  =  -1.0  to  9  =  +1.0,  but  low  slope  elsewhere  on  the  9-scale. 
Therofore,  Item  N  discriminates  well  within  that  range,  but  poorly  elsewhere.  A 
test  composed  of  items  like  item  N  would  be  an  excellent  item  for  discriminating 
amontj  examinees  in  the  range  9  =  -1.0  to  9  =  +1.0.    Item  L  has  low  slope  across 
a  wide  range  of  9.    Item  L  discriminates  a  little  almost  everywhere  on  the  9-scaie, 
^but  not  especially  well  anywhere.    Item  L  is  not  a  very  good  item. 

^    Comparing  items  L  and  N  points  up  what  is  called  the  bandwidth  paradox. 
You  can  have  an  item  with  high  discrimination  over  a  narrow  range,  or  low  dis- 
crimination over  a  wide  range,  but  you  can't  have  high  discrimination  over  a  wide 
range.    Thus,  sometimes  a  compromise  must  be  made  between  high  discrimination 
and  i:he  range  of  9  over  which  you  have  good  discrimination. 

Figures  7a  to  7d  show  the  IRFs  of  four  real  items  from  the  Coast  Guard 
Knowledge  section  of  the  Warrant  Officer  test. 

Item  #17  (Figure  7)  is  a  hard  item  with  high  discrimination.    It  is  the 
item  with  the  highest  a-value  I  have  seen.    It      an  extremely  unusual  iteni  for 
two  reasons:    its  high  a-value,  and  c-value  equal  to  zero.    Evidently,  there  is 
soriiething  about  this  item  that  makes  nearly  all  examinees  with  0  less  than 
+1.00  miss  the  item.    That  is  a  strange  situation  for  a  4-choica  item,  but 
actually  occurs  for  this  item. 

The  item  in  Figure  7b  is  an  easy  item  with  somewhat  low  discrimination.  The 
item  in  Fig.  7c  is  slightly  easier,  but  has  good  discrimination.    The  item  in 
Fig.  7d  is  of  medium  difficulty,  but  has  poor  discrimination. 

Now  what  do  you  do  with  the  IRFs  once  you  have  them?    One  thing  you  can  do 
is  to  add  them  up.    To  add  IRFs  you  merely  take  the  heigh ^  of  Ihe  IRF  of  each 
of  the  items  in  a  test  at  a  particular  9-value,  add  them  together,  and  plot 
that  point.    If  you  do  this  at  several  8-values,  and  connect  the  points,  you 
^ave  what  1s  called  t     Test  Characteristic  Curve.    The  Test  Characteristic 
fcurvo  (TCC)  gives  the    rue  (number  right)  Score  for  each  value  of  9. 
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Figure  7.  The  IRFs  of  four  actual  items  from  the 
Coast  Guard  Knowledge  section  of  the  U.  S.  Coast 
Guard  Warrant  Officer  Test,  series  8. 
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Figure  &  The  Test  Characteristic  Curve  of  a  test 
composed  of  four  real  items. 
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Figure  8  shows  the  TCC  for  a  test  composed  of  the  four  items,  whose  IRFs 
are  Jihown  in  Figure  7. 


Notice  that  the  TCC  is  neither  a  straight  line  or  an  ogive.    Each  test  will 
have  its  own  TCC,  which  is  the  sum  of  the  IRFs  of  the  items  in  the  test. 

One  of  the  interesting  uses  of  the  TCC  is  to  determine  the  distriDution 
of  the  true  scores  on  the  test.    Figure  9  shows  how  this  is  done.    If  the 
examinees^  9s  are  normally  distributed,  as  shown  on  9-  (upside  down),  the 
examinees^  true  scores  will  be  as  shown  on  the  left.    The  true  score  distribution 
is  found  by  projecting  the  intervals  from  the  9-scale  onto  the  TCC,  and  then 
representing  the  same  area  on  the  true  sco're  scale  within  the  projected  intervals. 
Figure  9  is  an  excellent  demonsty^ation  of  how  the  peculiarities  of  a  test  pro- 
duce a  distorted  metric. 

It  is  important  to  note  that  true  scores  (T)  are  not  observed  scores  (X). 
Observed  score  is  de'.ned  as  true  score  plus  error  (X  =  T  +  E).    However,  Lord 
has  found  that  the  distribution  of  X  will  be  similar  to  the  distribution  of  T, 
but  sometimes  with  tne  high  points  of  the  true  score  distribution  flattened 
sonewhat,  and  the  low  points  higher.    The  flattening  is  due  to  error. 

Wo  can  see  in  Figure  7a  that  item  #17  will  not  help  us  to  distinguish  among 
examinees  whose  9s  are  less  than  1.0  because  they  will  all  get  the  item  wrong. 
A  te!;t  made  exclusively  of  items  like  #17  would  do  nothing  to  distinguish  among 
examinees  with  9<1.0  because  they  would  all  get  zero  on  the  test.    It  would 
give  us  no  distinguishing  information  about  them. 

Icem  #17  also  gives  us  no  distinguishing  information  about  examinees  with 
Q  -  il.7  or  greater  because  they  will  all  get  it  correct.  On  a  test  composed 
of  it:ems  like  #17,  all  examinees  with  8S.2.7  would  got  100%. 

Between  9  =  1.0  and  9  =  2.7,  it  is  a  different  story.    From  9  =  1.0  to 
9  =  1.5,  P(9)  goes  from  P(9  =  1.0)  =  .00  to  P(9  =  1.5)  =  .08.    The  change 
of  P(9)  means  that  the  item  does  help  to  distinguish  among  examinees  within 
the  range  of  9  where  the  change  of  P(9)  occurs. 

Wo  can  see  that  the  greater  the  slope  of  the  IRF,  the  more  information 
the  item  gives  us  about  examinees  in  the  range  being  considered. 

The  slope  of  the  IRF  would  be  a  measure  of  the  relative  amount  of  information 
the  item  gives  about  examinees  at  that  point.    The  greater  the  slope,  the  more 
Infomation. 

If  we  plot  the  slope  of  the  IRF,  we  have  a  function  that  shows  the  relative 
amount  of  information  an  item  gives  at  each  point  on  the  9-5cale.  (Actually, 
the  slope  is  not  a  completely  appropriate  measure  of  information,  but  a  closely 
related  function  is.)  Cf/iLL 
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Figure  10.     The  Ite^'  Information  Functions  of  four 
real  items. 


898 

EKLC 


n.oo  .  wo -8 

CGK 

10.00..  ^  \7S2\ 
5.O0I 


Ke) 


1(e) 


-3 

11.00^ 

IQOO 
5.00: 
4.00 
3.00. 
2.00« 

1.00 

.00 


-2>-l-y-l.7-i;3|-.7  ->[^3   y  ']  I.Vljji.S  4.7 

2  3 


G 


WO-8 
CGK 

;«!5^I7.2I  &47 


■ffT  III  III — i—p- 1 — 1 1|  III  111 
-2.7-24-1.7 -l.3|-7  -.3  1-3   .7  |l.3i.7  I2.3  e.7 

-2-1        0^1  23 


-1 


9 


I 


ll.OO^ 

WO-8 
10.0  0_  CGK 


J5tl7,2l,47,&50 


A 


1(9) 


5.00 
4.00. 

3.C0- 
2.00. 

1. 00  J, 


.00, 


I 


-1 — I         I  !  J — rr> — r~l — » — rri — mi  mi 
i2.7  -2.y.i:7-i.3|'7  'i  |.3    .7  |l.3  17  \z3  27 

-5-2-10125 

e 

Figure  11.  a,  b,  and  c.  The  Test  Information 
Curve  of  a  test  composed  of  items  #  17 

and  #21,  a  text  composed  of  items  #  17, 

#  21.  and  #  47,  and  a  test  composed  of 

items  #  17,  #  21.  #  47,  and  #  50  from  the  USCG 
Warrant  Officer  Test; 
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The  curve  showing  the  amount  of  information  provided  by  the  test  along  the 
9-scale  is  called  the  Item  Information  Function  (IFF). 

The  IIFs  of  the  four  items  in  Figure  7  are  shown  in  Figure  10.    (Note  that 
the  vertical  axis  of  item  #17  is  a  different  scale  from  the  others  )  You 
ran  «-pp  the  enomious  amount  of  information  provided  by  item  #17  (with  a  - 
3  76)    co'JSparedTS  iter#50  (with  a  =  .62)     Thus,  the  higher  is  t  e  a-va  ue, 
the  nore  information  the  item  provides.    Also  of  interest  ir>  the  ^^^^  that 
the  higher  is  the  c-value,  the  less,  information  the  item  provides.    The  c-value 
destroys  information. 

What  do  we  do  with  the  IIF?   We  add  them  together     How  do      add  them 
tonel-her?    Just  like  we  added  the  IRFs  together  to  get  the  7CC._  We  take  the 
heinl'it  of  the  IIFs  at  a  particular  9-values,  and  connect  the  poincS.  The 
result  is  the  Test  Information  Curve  (TIC). 

Figure  11a  shows  the  sum  of  the  IIFs  for  items  #17  and  21  as  ^Jown  in  Figure 
10     Fiaure  lib  «=hows  the  IIF  of  item  #47  added  to  Figure  11a.    ngure  11c 
sLs  the  IIF  ofitem  #50  added  to  the  other  3  items.    A  test  composed  of  these 
four  items  would  have  the  wierd  TIC  in  Figure  11c. 

The  TIC  shows  the  relative  amounts  of  information  provided  by  the  test  at 
each  point  on  o!    Where  you  want  information  depends  on  w  a  /ou  w  11  u  e  the 
tP«;t  for     If  vou  want  to  select  a  few  examinees  from  a  large  number,  then  you 
IZt  Tlot  of  TfoSon  at  high  levels  of  9,      ^^at  ycu  can  tel  j 
pxnminees  are  the  best.    For  example,  see  Figure  12.    If  you  want  to  select  an 
fxaSnees  except  a  few,  then  you  want  a  lot  of  information  at  ^ow  9s  so  you  can 
tell  which  examinees  are  the  worst  (e.g.,  see  Mgure  U). 

Sometimes  a  test  is  designed  for  more  than  one  purposed  such  as  to  be  "sed  with 
two  rJt  scores  for  entrance  into  two  different  schools.    In  this  case  a  tv.o- 
huiV'd  TIC  will  give  good  information  at  the  two  cut  scores  (e.g.,  see  Figure 
14). 

A  TIC  of  any  desired  shape  may  be  constructed,  provided  the  items  with  the 
neceiisary  IIFs  are  available  to  construct  the  TIC. 

Usually  we  already  have  a  test  and  want  to  revise  it  to  make  it  better 
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Figure  12  Test  Information  Curve  of  a  hypo- 
thetical test,  which  would  be  efficient  for  a  high 
cut  score  (0  =  2.0). 
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Figure  13  Test  Information  Curve  of  a  hypo- 
thetical test,  which  would  be  efficient  for  a  low 
cut  score  (©  «  -2.3). 
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serv<:  our  purpose.    A  comparison  of  the  new  and  old  versions  should  be  made 
usinq  the  Relative  Efficiency  Curve  (REC).    The  REC  is  nothing  more  than  the 
ratio   of  the  TICs.    The  ratio  of  the  two  curves  is  found  by  dividing  the 
1(8)  of  one  test  by  the  1(9)  of  the  other  test  at  each  point  on  0.  Figure 
15  in  the  REC,  comparing  the  TIC  in  Figure  14  to  the  TIC  in  Figure  13, 

When*  the  REC  is  above  1.0,  the  test  'n  Figure  14  (the  test  for  which  the 
1(9)  is  the  numerator  c"'  the  REC  ratio)  is  better  than  the  test  for  Figure  13. 
When;  the  REC  is  below  1.0,  the  test  for  Figure  13  is  better.    And  where  the 
REC  -  1.0,  the  two  tests  are  the  same. 

By  starting  with  an  old  test,  making  substitutions  of  items,  and  calculating 
the  ilEC,  you  can  experiment  with  and  improve  the  old  test  by  trial  and  error. 
It  does  not  take  long  to  develop  some  skill  in  replacing  items  to  improve  the 
TIC  as  desired. 

Every  test  has  some  error  in  it.    The  Standard  Error  of  Estimate  (S.E.E.) 
is  the  expected  standard  deviation  of  errors  of  estimated  ability.    That  is, 
if  wo  were  to  give  a  test  to  a  group  of  examinees  with  identical  9s,  and 
estinate  their  9s  with  the  test,  the  standard  deviation  of  those  estimates 
would  be  the  S.E.E. 

^  Ir  the  estimate  of  9  is  unbiased,  the  S.E.E.  at  a  particular  9  is  easy  to  calcu- 
late from  the  TIC.  The  S.E.E.  is  equal  to  the  square  root  of  the  reciprocal  of  the 
heic/it  of  the  TIC  (l(9)). 


SEE-  ' 


Since  1(9)  varies  along  the  9sca1e,  so  will  the  S.E.E.  The  larger  1(0)  is,  the  smaller 
the  S.E.E.    A  small  S.E.E.  at  a  cut  point  is  highly  desirable. 

The  average  S.E.E.  (S.E.E.)  over  examinees  is  related  to  the  reliability  of  Classical 
Test  Theory  l^jf^  ) . 


Tliis  relation  implies  that  a  test  with  high  reliability  may  be  a  poor  test  for 
your  purposes  because  it  has  low  information  at  the  critical  values  of  9. 
Similarly,  a  test  with  low  reliability   may  be  an  excellent  test  for  some  psjrpo'ses.  If 
it  h<is  high  information  where  it  is  needed.    Thus,  reliability  is  highly  misleading 
ds  to  the  value  of  a  test. 
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Fi  9Ure  14.  The  Test  Infonnation  Curve  of  a  hypo- 
thetical test,  which  would  be  efficient  at  both  high 
and  low  cut-scores. 
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F  igure  15  The  Relative  Efficiency  Curve  compjir- 
ing  Test  Information  Curve  in  Figure  lO.So  to  that  in 
Figure  10.3b. 
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The  relation  also  makes  clear  the  dependence  of  ''^l.^^^il ity  on  the  distr  bution  of 
ability.    If  many  examin^^^  are  on  the  9  scale  where  there  is  ^^9^ _^ "f"^^*^^"'  ^^'^ 
reliability  will  be  highe    :h.in  if  they  are  distributed  on  9  at  points  where  information 
is  low. 

What  are  the  practical  applications  of  IRT?    There  are  'nany  practical 
applications.    I  will  mention  just  a  few. 

First  of  all,  IRT  explains  to  us  what  a  test  is  all  about,  and  what  an  item  is 
reallv  doing.    In  my  opinion,  for  the  first  time  in  the  history  of  testing, 
testing  practitioners  can  know  what  they  are  doing. 

Sdcond,  IRT  shows  how  to  construct  a  test  that  is  highly  efficient  for  any 
designated  purpose. 

Third,  IRT  makes  it  possible  to  estimate  an  examinee's  ability  level  with 
a  known  degree  of  accuracy,  and  without  making  the  dubious,  untestable  assumptions 
of  Classical  Test  Theory. 

Moreover,  IRT  provides  us  with  an  extremely  powerful  tool  for  special  studies, 
such  as  in  item  cultural  bias. 

Another  exciting  application  of  IRT  is  tailored  testing,  which  is  so  named  ^ 
because  it  allows  the  "tailoring"  of  the  test  to  the  ability  of  the  examinee.  ^ 

Tailored  tests  are  administered  by  a  computer  with  the  items  presented  on  a 
CRT  (Cathode  Ray  Tube  device,  which  is  similar  to  a  television  set).    It  works  like 
this: 

(1)    The  examinee  sits  in  front  of  a  CRT  attached  to  a  typewriter  keyboard. 
(•!)    The  examinee  registers  on  the  computer  with  his  identification,  test 

name,  and  other  pertinent  information.  ,-kv.3+ori 

U)    In  the  computer  are  stored  a  bank  of  150  to  200,  or  more,  prtcal ibrated 
itorc  along  with  their  item  parameters.    The  computer  selects  an  item  of 
average  difficulty  and  presents  the  item  to  the  examinee  on  the  CRT. 

(4)    The  examinee  records  his  answer  on  the  typewriter  keyboard. 

('■))    The  computer  uses  the  examinee's  response  and  the  item  parameters  to 
estiiiate  the  examinee's  most  likely  9,  and  then  selects  another  item.  The 
ftem  selected  is  ihe  one  which  will  best  help  the  computer  estimate  9  after  the 
exarninee  answers  the  item.    If  the  examinee  got  the  item  correct,  he  will  get 
(lif-Ferent  next  item  than  if  he  got  the  item  wrong.  ^  ^.u 

05)    Steps  (4)  and  (5)  above  are  repeated  until  T:he  computer  meets  the 

criterion  for  stopping  the  test. 

Examinees  with  different  response  patterns  will,  in  general,  get  a  different 
set  of  items;  yet  their  final  estimates  will  be  on  the  same  metric.    Not  al 
examinees  may  get  the  same  number  of  items,  yet  all  9  estimates  can  be  to  the 
sane  degree  of  accuracy.  ^.^ 
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Tailored  testing  has  several  advantages  over  conventional  tests. 
(1)    Depending  upon  the  characteristics  of  the  item  bank,  a  tailored  test 
Mwill  use  only  10%  to  50%  of  the  number  of  items  required  by  a  conventional 
^test  and  at  the  same  time  will  measure  more  accurately  than  the  conventional 
test  at  almost  all  values  of  9.    Tailored  tests  can  measure  to  any  specified 
degrne  of  accuracy. 

{2)    A  tailored  test  takes  much  less  time  to  administer,  or  several  abilities 
can  be  measured  uy  a  tailored  test  in  the  sa?iie  time  needed  to  measure  one  ability 
by  a  conventional  test. 

(3)    Security  of  the  items  is  much  improved,  because  different  examinees  get 
different  items,  and  because  the  items  are  much  less  accessible  (in  the  computer 
as  opposed  to  hard  copy). 

Work  is  progressing  toward  the  use  of  tailored  testing.    The  U.S.  Civil  Service 
Connission  has  adopted  the  use  of  tailored  testing  as  a  matter  of  policy.  The 
U.S.  Air  Force  Human  Resources  Laboratory,  San  Antonio,  Texas,  has  a  tailored 
testing  machine  operating  on  an  experimental  basis  at  the  San  Antonio  AF^^T 
{AmM.»d  Forces  Entrance  Examination  Station).    Several  studies  of  live  ta  -jred 
testing  have  been  published  by  the  Psychometric  Methods  Program  at  the  University 
of  Minnesota.    The  Educational  Testing  Service  is  also  considering  tailored 
testing  and  intends  to  engineer  its  own  tailored  testing  machine. 

In  closing,  I  hope  that  I  have  peaked  your  interest  in  IRT  enough  to  read  the 
entire  book,  where  these  concepts  are  explained  in  detail. 

Tlie  purpose  of  any  communication  is  the  creation  of  understanding.    That  is 
sole  purpose:    to  create  understanding  of  IRT  in  the  reader. 

^      If  there  is  any  part  of  this  book  that  you  do  not  understand,  then  I  have 
not  been  completely  successful  in  my  effort. 

Therefore,  I  would  sincerely  appreciate  any  corxients,  suggestions,  corrections, 
ideas,  or  discussion  about  this  book.    Please  feel  free  to  telephone  or  write 
to  rnci  for  further  explanation,  discussion,  criticism,  or  just  plain  chew  the  fat 
about  IRT. 


THOMAS  A.  WARM,  Chief,  Exam  Branch 
Research  and  Examination  Division 
U.S.  Coast  Guard  Institute 
P.O.  Substation  18 
Oklahoma  City,  OK  73169 

(405)686-2417  --  commercial 
732-2417  —  FTS 


Or 
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A  NEW  PROCEDURE  TO  MAKE  MAXIMUM  USE  OF  .^VAILABLE  INFORMATION  WHEN 
CORRECTING  CORRELATIONS  FOR  RESTRICTION  IN  RANGE  DUE  TO  SELECTION 


Games  0.  Boone 
Chief,  Selection  &  Testing  Research  Unit 
Aviation  Psychology  Laboratory 
FAA  Civil  Aeromedical  Institute 
Oklahoma  City,  Oklahoma 


Introduction* 

To  develop  or  update  a  test  battery  used  for  selecting 
personnel,  two  very  important  steps  must  be  completed.    First,  the 
most  valid  tests  must  be  chosen,  and  second,  a  weighting  system  must 
be  devised  which  y^ill  confine  these  tests  into  a  composite  that  yields 
a  maximum  validity  coefficient.    In  order  to  do  this  all  tests  under 
consideration  are  intercorrelated  with  each  other  and  correlated  with 
a  specified  criterion  of  job  success.    These  correlations  are  used  to 
regress  the  test  scores  on  the  job  success  criterion  and  the 
coefficients  from  the  regression  analysis  are  then  used  to  determine 
which  tests  should  be  included  in  or  deleted  from  the  battery  and  what 
the  relative  weights  should  be  for  each  test.    These  weighted  test 
scores  are  then  combined  to  form  the  composite  score  which  is  used  for 
selection. 

In  order  to  determine  the  utility  of  testa,  both  old  and  current 
tests,  it  is  necessary  to  correlate  them  with  some  criterion  measure 
of  job  success.    Unfortunately,  joh  success  measures  are  available 
only  for  those  individuals  selected,  and  this  selection  is  based  on 
scores  only  on  current  selection  tests.    An  important  factor 
influencing  the  size  of  correlation  coefficients  between  a  te5      nd  the 
criterion  is  the  range  of  scores  available  on  the  tests  and  o 
criterion.    Since  information  about  the  job  success  criterion  is 
available  only  for  applicants  who  have  been  selected  for  employment, 
only  the  upper  range  of  scores  is  available  on  the  criterion. 
Because  of  this  restriction  in  range,  the  correlations  between  current 
selection  test  scores  and  the  job  success  criterion  will  be  spuriously 
low. 

The  new  tests  being  considered  to  replace  part  or  all  of  an 
existing  test  battery  will  have  a  larger  range  and  variance  in  the 
selected  group  than  the  five  tests  actually  used  for  selection.  In 
fact,  the  range  and  variance  will  be  restricted  only  to  the  extent 
that  the  new  tests  correlate  with  the  old  tests,  and  will  be  as 
restricted  as  the  old  tests  only  if  this  correlation  is  1.0.  Because 
of  this  differential  restriction  in  range,  the  new  tests  will  corre- 
late higher  with  the  job  success  criterion  in  the  selected  group  than  . 
will  the  current  test^. 
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To  adjust  for  this  spurious  result^  the  correlations  with  the  job 
success  criterion  must  be  corrected  for  restriction  in  range  to  assess 
the  validity  of  the  tests  used  for  selection  and  to  determine  how  the 
current  tests  used  for  selection  compare  with  the  new  tests.  The 
correction  must  take  place  prior  to  performar:ce  of  regression 
analyses;  otherwise,  the  new  tests  will  oppear  superior  to  the  current 
tests  because  of  nothing  more  than  a  statistical  artifact.    This  also 
means  that,  when  corrected,  the  new  test  correlations  with  the 
criterion  will  generally  increase  less  than  the  olt!  test  correlations. 

The  Uniform  Guidelines  on  Employee  Selection  (1978)  state  that 
tests  used  for  personnel  selection  must  be  demonstrated  to  be  vdlid 
predictors  of  job  success,  and  the  magnitude  of  the  validity 
coefficient  must  be  both  "practically  and  statistically  significant'* 
(3).    The  spuriously  low  correlation  coefficient  due  to  selection, 
then,  becomes  a  very  important  legal  issue  in  addition  to  its 
importance  in  assessing  the  value  of  new  selection  tests.  Numerous 
litigations  have  occurred  as  a  result  of  this  problem,  several  of  which 
related  to  the  accuracy  of  the  methods  employed  in  correcting  the 
validity  coefficients  for  restriction  in  range  (1). 

There  are  two  mijor  statistical  formulas  which  have  been 
developed  to  correct  the  correlation  of  a  test  and  a  job  success 
criterion.    Both  major  formulas  estimate  the  value  of  RRyz  based  on 
the  inforination  available  on  the  restricted  group;    Rxy,  Rxz,  Ryz,  Sx, 
Sy,  and  Sz.    They  differ  in  their  assumptions  about  information  avail- 
able on  the  unrestricted  group. 

The  first  formula  (5),  Thorndike's  formula  7  case  III  (hereafter 
referred  to  as  T7),  assumes  that  only  SSx  is  available  for  the 
unrestricted  group  and  uses  the  ratio  SSx/Sx  and  the  restricted  corre- 
lations to  estimate  RRxy,  RRxz,  SSy,  and  SSz.    These  estimates  in  turn 
are  used  to  estima  :e  RRyz.    The  second  major  formula  (^),  Gulliksen's 
formula  37  (hereafter  referred  to  as  G37),  assumes  that  only  SSy  is 
available  on  the  unrestricted  group  and  uses  SSy-Sy  and  the  restricted 
correlations  and  variances  to  estimate  RRxy,  RRxz-*  SSx,  and  SSz. 
These  also  are  used  to  estimate  RRyz,  which  is,  of  course,  the  desired 
unrestricted  correlation  of  the  test  and  the  job  success  criterion. 

The  problem  in  using  either  of  these  formulas  for  the  ATC  selec- 
tion situation  is  that  both  T7  and  G37  require  making  estimates  of 
either  SSx  or  SSy  and  RRxy,  when  this  unrestricted  information  is 
actually  available  from  the  applicant  sample.    The  purpose  of  this 
study  was  to  develop  a  procedure  for  correcting  for  restriction  in 
range  using  available  unrestricted  values.    In  the  two  formulas 
already  developed,  estimates  of  SSz  and  RRxz  only  are  required  to 
estimate  RRyz.    In  order  to  make  maximum  use  of  the  unrestricted 
information,  two  formulas  were  derived  by  the  author.    The  first 
formula  (hereafter  referred  to  as  Bl)  uses  SSx  to  derive  estimates  of 
SSz  and  RRxz.    The  second  formula  (hereafter  referred  to  as  B2)  uses 
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SSy  to  derive  estimates  of  these  variables.    In  both  formulas,  the 
estimates,  along  with  the  actual  unrestricted  values  of  RRxy  a  id 
either  Sx  or  Sy,  were  used  In  conjunction  with  restricted  correlations 
to  estimate  RRyz.    The  four  formulas  were  compared  both  mathematically 
and  by  using  Monte  Carlo  techniques  to  determine  which  can  be  most 
accurate  In  estimating  RRyz  across  different  selection  ratios  and 
different  correlation  values. 

Methods. 


Following  Gulllksen's  (^}  :^chema  for  derivation  of  the  correction 
formulas,  three  assumptions  wer^  employed,  where  upper  case  and  lower 
case  letters  represent  unrestricted  and  restricted  variables 
respectively  and  x  =  the  test  used  for  selection,  y  =  the  new  test 
being  assessed,  z  =  the  success  criterion,  RR  =  the  unrestricted 
correlation  of  the  variable  subscripted,  SS  =  the  unrestricted  standard 
deviation  of  the  variable  subscripted,  R  =  the  restricted  correlation 
of  the  variable  subscripted,  and  S  =  the  restricted  standard  deviation 
of  the  variable  subscripted. 

Employing  the  following  assumptions; 


Rxy       =  RRxy  SSy 
Sx  SSx 

Rxz  Sz  =  RRxz  SSz 
Sx  SSx 


Sy2  (1  -  Rxy2)  =  SSy 2  (1  -  RRxy2) 
Sz2  (1  -  Rxz2)  =  SS.  -  (1  -  Pr>vz2) 


(1) 


(2) 


Pyz  -  RxyRxz 


RRyz  -  RRxyRRxz 


and^^  (1  -  Rxy2)(l  -  Rxz2)  (1  -  RRxy2)(l  -  RRxz2) 

it  can  be  shown  that 


(3) 


SSy 2  =  Sy2    f  (1  -  Rxy^)  +       Rxy2  SSx2 


(4) 


and 


SSz2  =  Sz2 


1  -  Rxz2  +  Rxz2      SSx  2"] 


(5) 


Equation  (3)  can  be  solved  for  RRyz,  and  equation  (1)  can  be  solved  for 
RRxyRRxz  to  produce 


(jr  ... 
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RRyz  =  (Ryz  -  RxyRxz)  SySz  +  RRxy^^Rxz 
SSySSz 


(6) 


and  RRxyRRxz  =  RxyRxz    SySzSSx^  . 

2 

Sx^SSySSz 

Substituting  (7)  In  (6)  and  factoring  our  SySz/SSySSz, 


RRyz  ^  SySz 
SSySSz 


Ryz  -  RxyRxz  ^-  RxvRxz  SSx^  . 


(7) 


(8) 


Substituting  the  estimates  for  S^y  (k)  and  SSz  (5)  in  the  root 
formula  (8)  and  simplylng  gives 


Ryz  -  RxyRxz  +  RxyRxz  SSx^ 

? 


bx 


RRyz  = 


(9) 


'l  -  Rxy2  +  Rxy2  55x2^^1  -  Rxz2  +  Rxz2  55x2^ 


Formula  (9)  is  equivalent  to  Thorndike's  T7  (and  also  to  Gulliksen's 
formula  19,  ref.  ^  p.  i^9). 

It  can  also  be  shown  from  assumptions  (1)  through  (3)  that 

SSx  =  Sx        ^fsSy^  -  3yg  (1  -  Rxy2),  (10) 

SyRxy 


and    SSz2  =  Sz 


Sy^^Rxy?  ,  Sy2Rxz2  +  SSy2Rx^2 
2  2 
Sy  Rxy 


(11) 


Returning  to  the  root  equation  (8),  substituting  the  estimates  for  SSx 
(10)  and  SSz  (11)  and  simplying  produces  the  second  correction  formula. 


RRyz  =       Rxz(SSy2  -  Sy2)  +  RxyRyzSSy^ 

SSy    ^  Rxz2(SSy2  -  Sy2)  +  Sy2Rxy2 
Formula  (12)  is  Gulliksen's  formula  G37, 


(12) 
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The  third  and  fourth  correction  formulas  employ  the  assumptions  of 
the  first  and  second  correction  formulas,  respectively,  and  nake  the 
additional  assumptions  that  the  new  test  under  consideration,  test  y, 
was  administered  to  the  applicant  group.    Consequently,  there  Is  no 
need  to  estimate  RRxy,  SSy,  or  SSx,  and  formula  (6)  can  be  utilized  as 
the  root  formula. 

Substituting  estimates  for  SSz  (5)  and  RRxz  (1^)  used  In  deriving 
the  first  correction  formula  (9)  In  the  root  formula  (6)  and  simplifying 
gives  the  third  correction  formula, 

.  Rxz  SSx 

RRyz  =  Sy(Ryz  -  RxyRxz)  +  Sx   RRxy. 


(13) 

To  obtain  the  fourth  correction  formula,  RRxz  must  be  derived  In 
terms  of  (SSy-Sy)  by  first  solving  equation  (2)  for  RRxz, 

RRxz2  =  1  .  Sz2  (1  ■  Rxz2),  (ii^) 

2 

SSz 

Substituting  (11)  in  (1^),  multiplying  and  simplifying  yields. 


RRxz  =  Rxz     /        SSy2  ■  Sy2  +  Sy^Rxy^  (15) 
2     2         2     2         2  2 
SSy  Rxz    -  Sy  Rxz    +  Sy  Rxy 


To  form  the  fourth  correction  formula,  (11)  and  (15)  are 
substituted  in  the  root  formula  (6)  and  simplified  giving, 

RRyz  =:  Sy(Ryz  -  RxyRxz)  +  RRxyRxz i    ( SSy^^Sy^ ) + Sy2Rxy2 

2      2      2      5       5      ?  /      2      2      2      2      2  2 

Sy  Rxy  -Sy  Rxz  +SSy  Rxz  A/  SSy  Rxz  -Sy  Rxz  +Sy  Rxy 

2  2 
Sy  Rxy 

(16) 
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A  demonstration  of  the         -icteristics  of  the  four  correction 
■formulas  In  terms  of  more  ref^       irtrluences  was  performed  by  using 
Honte  Carlo  techniques^    The  Hcaite  CaErlo  sttidy  examined  the  comparative 
accuracy  of  the  four  noriection  fors^^as  as  a  function  of  (1)  t^e 
selection  ratio,  (11)  RRxy,  ar»j  (Hi)  RRyz, 


In  order  to  generate  date  of  bnam  means,  standard  tievlations,  and 
izntercorrelatlons,  a  program  *€NR2IEJf    ?.)  was  modi  fed  by  the  author  and 
jHsed.    The  program  uses  the  tearssdo^o  s  reasonably  fast  methcnd  to 
fcnerate  normally  distributee  /arrrsifes  whose  covariances  are  those 
acquired  by  a  specified  corre^i^tior,  n^eitiix  iaput  into  t^e  proopram. 

A  summary  of  the  process  is  fTallows: 

1.  Generate  1,000  subjec'.-  wrrrr  scores  on  11  variajtiies  ^  defined 
t5y  means,  standard  deviations,  ^  rro:rrelati.:ns. 

2.  Sort  sample  into  dscerm*'      order  letTSer  on  hcqv-'  an  varxab:^ 


3.    Restrict  sample  ba^       se^^crion  T/-:itiDS    of  W  ^percent,  20 
puRTcent,  30  -oercent,  ^0  per  ti)  ^erut^nt. 

^.     "3a>^late  the  four  or  vv/4r?nt  estimate'  srrf  RRy;:  for  each 
"esrrrlcteQ  s^siinple  based  on  v7^'l*J^^  if  rRxy  rsng.  m  from    ^to  0.6  airi 
on  y^lmts  of   'Ryz  ranging  f  ^  0.5. 

5.-  Tr-^A**--  .^orm  all  correl^t      ^         '^^rziimatfiii  corn=2=slons  b  usin 
— F  u.    '  tr«TS formation  e^j'       -  l^c       Iss^r  aveTanmg. 

Rep   '     re  entire  pr  )0  times  anc  compt  te^e  mean  on 

ilBff     --rr^tei^  cdtorelations. 

The  :3Bsuits  were  prepare         tahclar  and  graphical  form.  Since 
th«  s/^ltf  si:  a  was  100,000,  s  iqrLffir'>nce  tests  were  deemed 
i   vop**#>tiate.    In  order  to  asi,t*&s  th^^  accuracy  of  piedlction  of  each 
(9f>  :eilAtxon  procedure,  an  erro.   ferw  w^s  calculated  baae^  on  the 
^tftsolute  value  of  the  difference  iie^^-en  the  actual  LmmwKaricted 
correl'^tiion  RRyz  and  the  estimasr  i3o. relation  Ryz.    Tabfe  1  comtains 
this  ^'rtDr  term,  RRyz  -  Ryz,  for  eaojh  correction  farmuil*^  for  rach 
selec  irfiR  ratio,  for  each  value  :)f       y,  and  for  each  valiue  of  RRyz. 
F3guSjf  <.  represents  this  error         as.  a  function  xjf  sackajctlon  ratio 
fior  tdh6  four  correction  formulas  aad  f^or  t^e  actual  ressteicted  corre- 
Istimi  Y  z.    Figure  2  represents  tjhe  eiirror  term  as  a  function  of  RRxy 
fngrithe  'our  formulas  and  Ryz.    Ficare  3  represents  tihe  error  term  as 
a  Tmction  of  RRyz  for  the  four  fop^u^  as  and  Ryz. 
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Discussiom. 


In  any  Monte  Carlo  study  a  deeislcoiD  rmust  be  maKfc  sfflncernin^  which 
components  are  to  be  \^ied  and  «^at  tte  r^n^  of  t*^n^r  variation  will 
be.    The  acomponents  sdfected  for  vari.^ifoin  and  thei^r  :jfflige  In  this 
study  wspe  established  ^(ubjectlvsfr^iy  bai»c£  on  values  ^ite  author 
consldaaari  representat>  ^  of  practJcaJ  situartions.    ^limsequently^  the 
dlscusaia»  of  the  resiaSAii  Is  more^  a  coaparison  of  the  piractlcal 
utility  oF  each  formula    rather  tsan  a  strict  mathematiCTl  comparison. 

Main  effects,    T^iie  1  demcmstflrates  the  overall  acscuracy  of  -each 
of  the  four  formulas  lin  terras  of  the  average  amount  of  ^rror  each 
Incurred  ±n  estimating  RRy^    Their  raai^  oroter  from  least  to  most  error 
is:    Bl,  T7,  B2,  and  Si;7.    The  first  thrrer  fonwulas  ai^  not  remaissably 
different;  however,  GH^  is  far  le$s  ^ccuratie  than  Fl,     7,  and  82.  The 
clearest  effect  on  errr^-  Is  :iroiiuoetr  ^Dy  frbe  selection  l^atlo  (Tabis  1). 
As  the  selection  rati^  .scones  extregwe ,  t*te  aawr^t  of  error 

Increases,  with  the  ittcryggp  besKJoming  larger  aPfd  iffcgfrr  with  each  £^ta^ 
down  in  the  selection  ratic..    Tab.V  1  sh(ows  little  f^iMCtuation  in 
error  for  RRxy  and  no  systeanatfe  (Pi^ttern,    TKe  effects  of  RRyz  in 
Table  1  show  a  pattern  that  was  foomt^  ea^islstently  thiraui^out  the 
analyses.    When'  Ryz  =  RRxx,  tNt  errc  t'  compofient  is  at  a  minimum. 
RRxz  was  held  at  a  constan^  .3(0  for   hts  stiisiy  and,  as  can  be  notor  j 
Table  1,  the  error  IncreaserB  ^  flBlj^  iitova^  in  eiteer  cfirection  fram 
.30, 

Practical  conclusions  i^tseed  fco  main  effects  indM&  the 
following.    If  sufficient  Infar^iWon  is  available,  thie  formwla 
produces  the  most  accurate  slrxnif^^e  for  tffiK^^.    In  order  tso  have 
sufficient  information  to  use  Bl.  the  crww  tesr  being  evaluated  wouic 
need  to  be  administered  to  the  appllcdfnt  grtHJp  at  the  :^«T«e  time  the 
old  selection  test  is  admini-^sETsd.    Then  RRxy  and  SS^  are  available 
for  use  in  Bl.    If  the  new  test  tbelnc  evalwiatesrf  was  nn^t  administered 
to  the  applicant  group,  then  tte  most  accsurate  correctioir  formula 
would  be  T7  which  does  not  require  RRxy  and  SSy. 

The  selection  ratio,  it  appears ^        the  largest aapact  on  errors 
in  estimating  RRyz.    If  selection  is  extcmnen  10  peressnt:  or  less,  the 
formulas  for  estimating  RRyz  avfr  i^y^tJdhle  ano  highly  ^Inaccurate.  This 
Is  a  difficult  practical  situatJom  t rcttolv^?^    A  gemf^essL  advertise- 
ment for  applicants  without  suf*^ct3>^' it  sfm^lf^c  qualification  state- 
ments results  in  a  larger  number  of  ^ualfftcd  canrildates  and  more 
extreme  selection.    However,  with  a  i  Lghly  ^wcific  advertisement 
self-selection  becomes  a  secondary  srl^ertioni  i^rocess,  and  the 
statistics  computed  on  the  applicamt  groqp  are  already  restricted 
producing  spuriously  low  validity  eorrelations.    One  strategy  would  be 
to  administer  the  selection  tests  tuz>  a    T^dtew  saniiie  in  the  general 
population,  stratifying  by  race  and  .seat  Ir  writer  to  meet  Equal 
Employment  Opportunity  Commission  r^qusrBOEfvtts.    This  would  yield 
unrestricted  variances  without  the  Infleict^  of  any  selection  proce*, 
dure. 
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Since  RRyz  Is  not  known  and  RRxy  Is  computed  after  the  test 
aoBiinistratlon ,  little  practical  guidance  can  be  offered  related  to 
these  ;fW[rameters .    The  usual  advice  Is  clearly  applicable,  viz, 
chQOs^  a  test  or  construct  a  test  for  selection  that  parallels  the 
actual  Job  tasks  as  closely  as  possible. 

Sateractlon  effects.    As  seen  In  Figure  1,  when  error  In  predic- 
tion is  examined  by  selection  ratio  for  each  formula  and  for  the 
actuaiL  restricted  correlation  of  Ryz,  there  is  a  tremendous  amount  of 
error  for  the  10-percent  selection  ratio,  with  formula  Bl  doing  a  much 
better  Job  than  either  T7,  B2,  or  G37  in  estimating  RRyz.    As  the 
selection  ratio  increases  beyond  moderate  selection  (30  percent),  the 
formulas  tend  to  perform  similarly  in  estimating  RRyz,  with  the 
exception  of  G37  which  consistently  has  more  error  than  the  other 
three  formulas  across  all  selection  ratios. 

Figure  2  demonstrates  that  formula  81  again  is  consistently  the 
bettear  estimator  of  RRyz  across  values  of  RRxy.    It  can  also  be  noted 
f rom  T^gure  2  that  as  the  value  of  RRxy  increases,  Ryz  rapidly 
becomes  a  poorer  estimator  of  RRyz,  particularly  after  it  passes  the 
point  at  which  RRyz  equals  RRxz  (.30).    Once  again,  G37  is  a  much  less 
accurate  estimator  of  RRyz  than  the  other  three  formulas. 

When  RRyz  is  less  than  .30,  as  shown  in  Figure  3,  Bl  is  the 
better  estimator  of  RRyz.    All  formulas  converge  when  RRyz  equals 
RRxz  (.30)  and  T7  is  the  best  estimator  for  higher  values  of  RRyz 
although  the  differences  are  small.    Once  again  formula  G37  is  clearly 
the  least  accurate  estimator  of  RRyz. 

The  practical  implications  for  the  interaction  effects  can  be 
stated  briefly.    The  selection  ra:io  has  such  an  overwhelming  effect 
that  generally  the  internet ion  effects  are  primarily  due  to  the 
selection  ratio.    When  the  selection  ratio  is  small  to  moderate  (10  to 
30  percent),  formula  Bl  is  clearly  the  most  accurate  estimator  and 
should  be  used  regardless  of  RRxy  and  RRyz.    When  the  selection  ratio 
goes  above  30  percent,  Bl,  T7,  and  82  are  practically  equivalent. 
Formula  G37  is  the  least  desirable  correction  formula  across 
conditions.    Thus,  overall,  Bl  results  in  the  most  accurate  estimates 
of  RRyz,  especially  when  the  selection  ratio  is  30  percent  or  less, 
regardless  of  the  values  of  RRxy  or  RRyz. 
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Table  L    Average  trrrar  In  Estimation  if  RRyz 


Errror  — rmula 


Bl 

(  .• 

G37 

B2 

3.055: 

0.078 

0.058 

£3is  = 

.  .  01' 

0.11 

C.07 

i*-  ~-;is5cnrion  Ratio 

m 

30% 

kO% 

50« 

MeariS  = 

0.050 

0.037 

0.028 

St  as  = 

c  . : 

0.11 

0.08 

0.06 

.2  ) 

.30 

.50 

Mearn:  = 

0 ,. ;  ' 

C  .  '>59 

0.058 

0.05^^ 

0.062 

StEI  = 

n..  It 

0.1i^ 

0.1i^ 

0.18 

by  ^"Vz 

.  10 

J 

.30 

.i^0 

.50 

Meiina  = 

0  .073 

r  ..oS 

0.0^8 

0.060 

0.056 

SidsL  = 

•  .17 

.13 

0.11 

0.15 

0.18 

915 
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ATr 


.15 


13 


N 

I 


.11 


^  09 


.07 


.05 


.03 
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Figure  1.    Error  by  selection  ratio  for  the  four  correction  formulas 
and  the  actual  restricted  value  of  Ryz. 
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Figuns  Z,  Error  by  values  of  RRxy  for  the  four  correction  formulas  and 
the  actual  restricted  value  of  Ryz. 
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Figure  3.  Error  values  of  RRyz  for  the  four  correction  formulas  and  the 
p,9^.        actual  restricted  value  of  Ryz. 
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I.  Introduction, 

The  Unlfoim  Guidelines  on  Employee  Selection  Procedures  (1978) 
(9),  which  were  recently  adopted  by  the  U.S.  Civil  Service  Commission, 
the  Equal  Employment  Opportunity  Commission,  the  Department  of  Justice, 
and  the  Department  of  Labor,  state  that  a  selection  procedure  has  an 
adverse  Inpact  if  the  selection  rate  for  any  racial,  ethnic,  or  sex 
group  is  less  than  four-fifths  of  the  rate  for  the  group  with  the 
highest  selection  rate.    The  guidelines  further  state  that  these  same 
rules  apply  to  any  employment  decision,  which  can  include  training, 
retention,  or  promotion.    The  current  Air  Traffic  Control  (ATC) 
training  program  conducted  at  the  Federal  Aviation  Administration's 
(FAA)  Academy  is  a  pass/fail  program  which  affects  whether  or  not  the 
trainee  will  be  retained  by  the  FAA  in  the  ATC  option.    As  such,  it 
Involves  an  employment  decision  and  is  subject  to  the  standards  for 
validation  research  and  fairness  defined  by  the  guidelines. 

Although  the  Uniform  Guidelines  acknowledge  that  "the  concept  of 
fairness  or  unfairness  of  selection  procedures  is  a  developing 
concept,"  they  require  that,  when  feasible,  a  test  must  be  demonstrated 
to  be  fair.    The  guidelines  further  specify  that  "unfairness  is 
demonstrated  through  a  showing  that  members  of  a  particular  group 
perform  better  or  poorer  on  the  job  than  their  scores  on  the  selection 
procedure  would  indicate  through  comparison  with  how  members  of  other 
groups  perform."    The  key  concept  in  this  definition  of  fairness  is 
that  performance  of  a  group  is  compared  to  the  performance  of  the 
larger  group  on  both  the  selection  procedures  and  the  job  performance 
measures.    If  performance  is  not  the  same  for  both  groups  on  both 
measures,  unfairness  may  exist. 

Unfortunately,  deciding  when  "performance  is  not  the  same"  is  not 
as  simple  as  it  may  seem.    The  literature  has  many  articles  offering 
approaches  to  the  evaluation  of  test  fairness.    However,  these 
articles  seldom  deal  with  the  distribution  of  various  fairness 
indices,  nor  do  they  address  directly  the  decision  processes  involved 
in  deciding  whether  or  not  a  test  is  fair.    Several  authors  have  found 
that  the  major  definitions  of  test  fairness  lead  to  conflicting  con- 
clusions about  test  fairness  (1,^,7).    In  addition,  Hunter  and  Schmidt 
(5)  concede  that  they  cannot  agree  on  a  definition  of  test  fairness. 
The  available  literature  offers  many  methods  of  evaluating  test  fair- 
ness but  little  guidance  in  choosing  the  most  appropriate  method. 


Most  of  the  models  of  test  fairness  define  it  in  psychometric 
terms.    The  three  major  models  to  be  discussed  in  the  present  study 
define  fairness  in  the  dichotomous  case  in  which  an  applicant  is 
either  accepted  or  rejected  based  on  a  predictor  score  and  would 
succeed  or  fall  based  on  a  criterion.    Table  1  graphically  depicts  this 
situation  and  states  the  three  major  models  of  test  fairness, 
verbally  and  mathematically,  in  terms  of  the  four  cells  depicted  in 
the  table. 

The  first  model  is  Thorndike's  (8)  Constant  Ratio  model  (CR) 
which  states  that  for  a  test  to  be  fair,  the  ratio  of  the  proportion 
successful  to  the  proportion  selected  should  be  equal  for  the 
minority  and  the  majority  groups.    Expressed  in  terms  of  the  cells  in 
Table  1,  the  ratio  of  the  sum  of  the  cells  I  and  II  to  the  sun  of 
cells  I  and  IV  should  be  equal  for  both  groups.    Darlington's  (2) 
Conditional  Probability  model  (CP)  states  that  a  test  is  fair  if  the 
probability  of  selection,  given  that  an  individual  is  successful,  is 
equal  for  both  groups.    In  terms  of  the  cells  in  Table  1,  the  ratio 
of  ceil  I  to  the  sum  of  cells  I  and  II  should  be  equal  for  both 
groups.    Finally,  Einhorn  and  Bass  (3)  propose  the  Equal  Probability 
model  (EP)  in  which  a  test  is  considered  fair  if  the  probability  of 
success,  given  that  an  individual  is  selected,  is  equal  for  both  the 
minority  and  the  majority  groups.    In  terms  of  the  cells  in  Table  1, 
the  ratio  of  cell  I  to  the  sum  of  cells  I  and  IV  should  be  equal  for 
both  groups.    The  three  models  differ  in  the  target  groups  to  which 
they  are  "fair."    The  Constant  Ratio  model  Is  aimed  at  insuring  that 
the  proportion  of  applicants  selected  from  both  groups  Is  fair.  If 
this  model  is  used,  an  equitable  proportion  of  applicants  from  both 
groups  will  be  hired.    The  Conditional  Probability  model  Is  targeted  at 
successful  individuals  and  Is  intended  to  insure  that  an  equitable 
number  of  successful  individuals  will  be  hired.    The  Equal  Probability 
model  Is  targeted  at  individuals  already  hired  and  is  intended  to 
Insure  that  an  equitable  number  of  hired  Individuals  will  be  success- 
ful.   These  models  can  lead  to  conflicting  conclusions  about  the  fair- 
ness of  a  test.    However,  there  is  very  little  in  the  literature  to 
describe  the  distribution  characteristics  of  the  three  models  and  how 
their  distributions  differ. 

The  purpose  of  the  present  study  is  to  evaluate  the  distribution 
of  the  fairness  statistics  gc*ierated  by  the  Constant  Ratio,  the 
Conditional  Probability,  and  the  Equal  Probability  models  of  test 
fairness.    Since  the  sample  size  is,  in  general,  much  smaller  for  the 
minority  sample  than  for  the  majority  sample,  the  three  fairness 
indices  will  be  compared  for  a  large  sample  and  a  smaller  sample 
across  different  success  ratios  on  both  the  criterion  and  the  predictor 
and  also  across  different  correlations  of  predictor  and  criterion. 
Research  studies  have  shown  that  sampling  error  leads  to  an  Inverse 
relationship  between  sample  size  and  correlations  (6).    It  is  expected 
that  sampling  alone  should  cause  the  correlations  for  the  small  sample 
to  be  higher  than  corresponding  correlations  for  the  large  sample. 
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Table  1.    Three  Definitions  of  Test  Fairness 


Succeed 


CRITERION 


False 
Negatives 


II 


True 

Positives 


III 


Fall 


IV 


True 

Negatives 


False 
Positives 


Reject  Select 
PREDICTOR 

CONSTANT  RATIO  MODEL  (CR)  -  Thorndlke  (1971)    The  ratio  of  the  propor- 
tion successful  to  the  proportion  selected 
should  be  equal  for  both  the  majority  and 
minority  groups. 


+  IVa  lb  +  IVb 


a 

r 

^a 


CONDITIONAL  PRCv^Ar^ILITY  MODEL  (CP)  -  Darlington  (1971)    The  prob- 
ability of  selection,  given  that  an  indi- 
vidual is  successful,  should  be  equal  for 
both  the  majority  and  minority  groups. 

la  lb 

la  +  Ha  lb  +  Tib 

EQUAL  PROBABILITY  MODEL  (EP)  -  Einhorn  and  Bass  (1971)    The  prob- 
ability of  success,  given  that  an  individ- 
ual Is  selected,  should  be  equal  for  both 
the  majority  and  minority  groups. 

la  lb 
la  +  IVa  lb  +  IVb 

where  a  =  majority  group;  b  =  minority  group 
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The  Constant  Ratio  model  Is  not  sensitive  to  differences  In  the  corre- 
lation of  the  predictor  and  criterion,  while  the  Conditional 
Probability  and  the  Equal  Probability  models  are.    It  Is  expected  that 
the  Constant  Ratio  model  will  be  more  robust  to  sampling  errors 
related  to  sampling  size  than  will  either  the  Equal  Probability  or  the 
Conditional  Probability  models. 

II.  Method. 

The  data  used  for  analysis  In  this  study  were  computer  generated 
by  using  a  Monte  Carlo  technique.    This  approach  allows  the  generation 
of  a  number  of  variables  with  specified  means,  standard  deviations, 
and  Intercorrelatlons.    The  technique  essentially  allows  definition  of 
the  characteristics  of  a    population  and  then  selects  samples  from 
that  population.    A  score  of  70  or  greater  was  arbitrarily  set  as  a 
cut  score,  scores  above  70  were  defined  as  successful  for  the 
criterion  variable,  and  scores  above  70  were  defined  as  selected  for 
predictor.    Variable  means  and  standard  deviations  were  assigned 
values  such  that  either  60  percent,  70  percent,  or  80  percent  of  the 
sample  would  be  above  the  cut  score,  and  predictor/criterion  correla- 
tions of  .3  or  .4  were  assigned.    Nine  variables  were  generated  for 
this  study  by  using  the  proportion  above  70  and  the  correlations 
specified  In  Table  2.    The  success  rates,  selection  rates,  and 
predictor/criterion  correlations  were  chosen  based  on  recent  experi- 
ence with  the  FAA*s  Air  Traffic  Control  selection  and  training 
program.    The  18  possible  combinations  of  selection  ratio,  success 
ratio,  and  predictor/criterion  correlation  described  In  Table  3  were 
evaluated. 

Table  2.    Proportion  Above  a  Score  of  70  Assigned  Each  Variable  and 
Relevant  Correlations  Input  Into  Monte  Carlo  Program 


Proportion 

Var  #1 

12  3 

4 

5 

6 

7 

8 

9 

.60 

1 

X    .3  .M- 

V 

A 

V 
A 

Y 
/\ 

X 

X 

.60 

2 

X  X 

.4 

X 

X 

.3 

X 

X 

.60 

3 

X 

X 

X 

X 

A 

X 

X 

.70 

X 

.3 

A 

X 

X 

X 

.70 

5 

X 

X 

X 

.3 

X 

.70 

6 

X 

X 

A 

X 

.80 

7 

X 

■  .3 

A 

.80 

8 

X 

X 

.80 

9 

X 

The  correlations  denoted  by  X  were  not  used  in  the  analysis. 

Each  sample  that  was  generated  contained  1,000  subjects  of  which 
100  were  randomly  assigned  to  the  minority  group  and  900  were  assigned 
to  the  majority  group.    Since  both  the  minority  and  the  majority  groups 
were  from  the  same  population,  the  predictors  should  be  equally  fair 
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across  success  ratios,  selection  ratios,  and  predictor/criterion 
correlations.    The  CP,  EP,  and  CR  indices  were  calculated  for  the  18 
conditions  described  in  Table  2.    This  process  was  repeated  100  times. 

Table  3.    All  Possible  Combinations  of  Selection  Ratio,  Success 
and  Predictor/Criterion  Correlation 


Selection 

Success 

Rxy 

X 

y 

Ratio 

Ratio 

variable 

variable 

1 

.60 

.60 

.3 

1 

2 

2 

.60 

.60 

A 

1 

3 

3 

.60 

.70 

.3 

1 

.60 

.70 

A 

2 

5 

.60 

•  ^ 

2 

7 

6 

.60 

.80 

A 

3 

7 

7 

.70 

.60 

.3 

1 

8 

.70 

.60 

A 

2 

9 

.70 

.70 

.3 

5 

10 

.70 

.70 

A 

6 

11 

.70 

.80 

.3 

5 

8 

12 

.70 

.80 

A 

6 

8 

13 

.80 

.60 

.3 

7 

2 

14 

.80 

.60 

A 

7 

3 

15 

.80 

.70 

.3 

8 

5 

16 

.80 

.70 

A 

8 

6 

17 

.80 

.80 

.3 

7 

8 

18 

.80 

.80 

A 

7 

9 

III.  Results. 

Table  4  shows  the  average  proportion  above  a  score  of  70  and  the 
average  intercorrelation  matrix  obtained  across  the  100  large  samples 
and  the  100  small  samples.    Table  5  gives  the  distribution 
characteristics  of  three  fairness  indicators  for  both  the  large  samples 
and  small  samples  when  the  various  combinations  of  selection  ratios, 
success  ratios,  and  predictor/criterion  ratios  are  combined.    Table  6 
gives  the  distribution  characteristics  of  the  large  and  small  sample 
fairness  indicators  when  the  selection  ratio  is  equal  to  the  success 
ratio,  when  the  selection  ratio  Is  less  than  the  success  ratio,  and 
when  the  selection  ratio  is  greater  than  the  success  ratio.    Table  7 
contains  the  distribution  characteristics  of  the  large  and  small 
sample  fairness  indicator  when  the  predictor/criterion  correlation  is 
.3  or  A. 

In  order  to  compare  the  fairness  indices  for  the  large  and  small 
groups,  the  indices  were  expressed  first  as  a  ratio  of  the  large  group 
index  to  the  small  group  index  (LG/SM),  and  then  as  a  ratio  of  the 
small  group  index  to  the  large  group  index  (SM/LG).    The  distribution 
characteristics  of  these  indices  are  described  in  TabJe  8. 
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Table  k.    The  Average  Proportion  Above  a  Score  of  70  and  the  Average 
Correlation  Matrix  Across  the  100  Large  Samples  and 
the  100  Small  Samples 


For 

100 

Large 

Samples 

1 

Average 

8 

Pronortlon 

Var  // 

1 

2 

3 

5 

6 

7 

9 

.608 

1 

X 

0.31 

0.'f2 

0.30 

X 

X 

X 

X 

X 

.603 

2 

X 

X 

O.ifif 

X 

X 

0.31 

X 

X 

.6'f3 

3 

X 

X 

X 

X 

0. 4J 

X 

X 

.703 

X 

0.3'f 

O.W 

X 

X 

X 

.727 

5 

X 

X 

X 

0.29 

X 

.712 

V 
/\ 

Y 
/\ 

n  /ii 

Y 

.'808 

7 

X 

0.37 

0.^2 

.806 

8 

X 

X 

.818 

9 

X 

For 

100 

Small  Samples 

1 

Average 

Q 
O 

y. 

Proportion 

Var  # 

1 

2 

3 

5 

6 

7 

.590 

1 

0.'f2 

0.53 

0.32 

X 

X 

X 

X 

X 

X 

.583 

2 

X 

0.30 

X 

X 

X 

0.'f2 

X 

X 

.607 

3 

X 

X 

X 

X 

0.'f7 

X 

X 

.727 

X 

0.23 

o.« 

X 

X 

X 

.71'f 

5 

X 

X 

X 

0.39 

X 

.700 

6 

X 

X 

0.57 

X 

.780 

7 

X 

0.31 

.780 

8 

X 

X 

.802 

9 

X 

^The  correlations  denoted  by  X  were  not  used  In  the  analysis. 
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Table  5.    Distribution  Characteristics  for  the  Three  Fairness 
Indicators  for  the  Large  and  Small  Samples 


Range 

Mean 

ou 

Lo 

Hi 

CRLG 

1.02 

.16 

1.35 

CRSM 

1.01 

.18 

.67 

1.1^9 

CPLG 

0.77 

.07 

.63 

0.88 

CPSM 

0.77 

.09 

.57 

0.9'f 

EPLG 

0.78 

.07 

.63 

0.88 

EPSM 

0.77 

.09 

.57 

0.91^ 

CRLG 

CRSM 

CPLG 

CPSM 

EPLG 

EPSM 

CRLG 

1.000 

.956  - 

.821 

-  .753 

.791 

.737 

CRSM 

1.000  - 

.776 

-  .787 

.758 

.755 

CPLG 

1.000 

.866 

-  .311  - 

.298 

CPSM 

1.000 

-  .321^  - 

.202 

EPLG 

1.000 

.902 

EPSM 

1.000 

where 

CR  is  the 

Constant 

Ratio 

model 

CP  Is  the  Conditional  Probability  model 
EP  is  the  Equal  Probability  model 
LG  is  the  large  sample 
SM  is  the  small  sample. 
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Table  6.    Distribution  Characteristics  for  the  Three  Fairness 
Indicators  for  Large  asnd  Small  Samples  Comparing 
Selection  Ratio  andi  Success  Ratio 


Selection 

Ratio 

Equals 

Success 

Range 

Mean 

SD 

Lo 

Hi 

CRLG 

1.017 

.02k 

.97 

1.08 

CRSM 

.999 

.0^5 

.89 

1.11 

CPLG 

.773 

.058 

.68 

.86 

CPSM 

.778 

.076 

.61 

.88 

EPLG 

.786 

.0:>5 

.65 

.  oo 

EPSM 

.776 

.07k 

.62 

.89 

Selection  Ratio  Is  Less  Than  Success  Ratio 

Range 


Mean 

SD 

Lo 

Hi 

CRLG 

l.Vik 

.081 

1.10 

1.35 

CRSM 

1.220 

.099 

1.00 

\.k^ 

CPLG 

.703 

.Ok(> 

.63 

.77 

CPSM 

.698 

.057 

.57 

.79 

EPLG 

.836 

.035 

.76 

.88 

EPSM 

.8'f7 

.Okb 

.73 

Selection  Ratio  is  Greater  Than  Success  Ratio 

Range 


Mean 

SD 

Lo 

Hi 

CRLG 

.m\ 

.05if 

.Ih 

.91 

CRSM 

.825 

.068 

.67 

1.00 

CPLG 

.836 

.035 

.76 

.88 

CPSM 

.Okb 

.73 

.9if 

EPLG 

.703 

.Okd 

.63 

.77 

EPSM 

.698 

.057 

.57 

.79 

Where  CR  is  the  Ccmstant  Ratio  model 

CP  is  the  Conditional  Probability  model 
EP  is  the  Equal  Probabilii:y  model 
LG  is  the  large  sample 
SM  is  the  small  sample. 
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Table  7.    Distribution  Characteristics  for  the  Three  Fairness 
Indicators  for  Large  and  Small  Samples 
Comparing  Predictor/Criterion  Correlations 


Predictor/Criterion  Correlation  Equals  .3 

Range 


Mean 

SD 

Lo 

HI 

CRLG 

1.016 

.165 

.74 

1.35 

CRSM 

1.013 

.182 

.67 

1A9 

CPLG 

.761 

.63 

.87 

CPSM 

.760 

.088 

.57 

.91 

EPLG 

.  763 

.074 

.63 

.87 

EPSM 

.758 

.087 

.57 

.91 

Predictor/Criterion  Correlation  Equals  A 

Range 


Mean 

SD 

Lo 

HI 

CRLG 

1.019 

.U5 

.78 

1.28 

CRSK 

1.017 

.173 

.69 

1.44 

CPLG 

.781 

.069 

.68 

.88 

CPSM 

.789 

.082 

.62 

.94 

EPLG 

.787 

.067 

.68 

.88 

EPSM 

.790 

.081 

.62 

.94 

where  CR  Is  the  Constant  Ratio  model 

CP  Is  the  Conditional  Probability  model 
EP  is  the  Equal  Probability  model 

SM  is  the  small  sample. 
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Table  8.    Distribution  Characteristics  for  Ratios  of 
the  Three  Fairness  Indicators 


Range 


Mean 

SB 

Lo 

Hi 

CR  LG/SM 

1.01 

.05 

.88 

1. 

15 

CR  SM/LG 

1.00 

.05 

.87 

1. 

l^f 

CP  LG/SM 

1.00 

.06 

.86 

1. 

20 

CP  SM/LG 

1.00 

.05 

.83 

1. 

17 

EP  LG/SM 

1.00 

.05 

.86 

1. 

20 

EP  SM/LG 

1.00 

.05 

.83 

1. 

17 

CR 

CR 

CP 

CP 

EP 

EP 

LG/SM 

SM/LG 

LG/SM 

SM/LG 

LG/SM 

SM/LG 

CR  LG/SM 

1.000 

-  .997 

-  .55'f 

.5'f'f 

.if  if  8 

-  .'f38 

CR  SM/LG 

1.000 

.57'f 

-  .563 

-  .'f26 

.if  16 

CP  LG/SM 

1.000 

-  .996 

.'f93 

-  .502 

CP  SM/LG 

1.000 

-  .502 

.513 

EP  LG/SM 

1.000 

-  .996 

EP  SM/LG 

1.000 

where  CR 

is 

the  Constant  Ratio  model 

CP 

is 

the  Conditional  Probability  model 

EP 

is 

the  Equal  Probability  model 

LG 

is 

the  largie  sample 

SM 

is 

the  small  sample. 
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IV.  Dlscussioct 


As  expect«ijj  Table     shows  that  the  correlations  for  the  small 
samples  tended 'tr  be  higher  than  those  for  the  large  samples*    It  Is 
not  surprising  TBhat  for  all  three  fairness  Indicators,  the  small 
sample  groups  demonstrated  greater  variation  than  did  the  larger 
sample  groups.    The  range  of  the  fairness  Indicator  was  virtually 
Identical  for  the  CP  and  EP  models,  and  was  a  smaller  range  than  that 
for  the  CR  model.    This  Is  to  be  expected  since  the  CP  and  EP  Indices 
could  range  only  from  0  to  1,  while  the  CR  index  could  range  from  0  tv, 
infinity. 

When  the  distributions  of  fairness  indicators  are  examined  for 
the  three  relationships  of  selection  ratio  to  success  ratio 
described  in  Table  6,  it  can  be  seen  that  all  three  tend  to  have 
moderate  values  when  selection  ratios  are  equal;  CR  and  EP  have  high 
values  when  selection  ratios  are  greater  than  success  ratios,  while 
the  CP  value  tends  to  be  higher  when  the  selection  ratio  is  greater 
than  the  success  ratio.    Both  CP  and  EP  show  the  greatest  amount  of 
variance  when  the  selection  ratio  is  equal  to  the  success  ratio,  while 
CR  shows  the  greatest  amount  of  variance  when  the  selection  ratio  is 
less  than  the  -success  ratio.    When  the  distributions  of  the  fairness 
indices  for        large  and  small  samples  are  examined  separately  for 
correlations i/yf  .3  and  A  (see  Table  7),  all  three  fairness  indicators 
have  lower  maawts  and  higher  standard  deviations  for  the  lower 
correlation. 

The  fairness  indicator  ratios  described  in  Table  8  show  that  the 
distribution  differences  observed  in  Table  5  virtually  disappear.  The 
means  of  these  ratios  are  around  1.00  (as  they  should  be  when  the  test 
is  "fair");  the  small  standard  deviations  and  the  range  of  the  ratios 
are  almost  identical  for  the  large  group/small  group  and  for  the 
small  group/large  group  indices.    It  would  appear  that  all  three  fair- 
ness indicators  show  similar  patterns  of  covariance  between  the  large 
sample  and  small  sample  groups. 

Based  on  the  data  from  the  present  study,  there  is  no  compelling 
statistical  reason  to. choose  any  one  of  the  three  fairness  indicators 
over  the  others.    The  range  of  the  values  of  the  indicators  is 
affected  by  both  the  relationship  of  selection  and  success  ratios,  and 
predictor/criterion  correlations.    However,  while  the  magnitude  of  the 
fairness  indicator  may  vary,  the  relationship  of  the  fairness  indica- 
tors for  the  large  and  small  groups  remains  about  the  same,  no  matter 
which  fairness  indicator  is  used.    The  three  fairness  indicators  are 
equadJ^y  likely  to  lead  the  investigator  to  conclude  that  a  test  is 
falir  when  the  majority  and  minority  groups  are  chosen  from  the  same 
population  and  differences  between  the  groups  are  due  to  sampling, 
Q^jjite  freqiwitly,  however,  this  is  not  the  case  in  the  real  world. 
Members  of  minority  and  majiority  groups  may  be  recruited  in  different 
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ways  and  may  differ  dramatically  In  education,  experience,  socio- 
economic status,  and  other  demooraphlc  variables  that  will  affect  their 
performance  on  the  selection  devices.    The  applicants  from  the 
majority  and  minority  groups  may  have  different  means  on  the  selection 
tests,  and  If  the  means  for  the  minority  group  are  lower  than  the 
means  for  the  majority  group,  then  the  proportion  selected  from  the 
minority  applicants  could  well  be  less  than  four- fifths  tshe  proportion 
selected  from  the  majority  applicants.    If  this  is  the  case,  then  the 
Uniform  Guidelines  state  that  adverse  Impact  has  occurred,  and  the 
user  must  demonstrate  that  the  selection  is  fair. 

The  Constant  Ratio  model  could  be  used  at  this  point  to  determine 
if  the  differential  proportion  selected  for  the  minority  group  is 
compensated  for  by  a  differential  success  rate.    If  the  CR  definition 
of  fairness  is  met,  it  is  unlikely  that  the  selection  procedure  as 
defined  will  be  perceived  as  unfair.    The  CR  model  is  insensitive  to 
the  magnitude  of  the  correlation  of  the  predictor  and  the  criterion, 
so  it  would  be  possible  to  meet  the  CR  definition  of  fairness  while 
still  selecting  majority  and  minority  applicants  with  vastly  different 
probabilities  of  success.    If  this  is  the  case,  and  if  the  minority 
group  members  selected  have  a  lower  probability  of  success  than  the 
majority  group  members,  the  minority  group  members  will  have  a  higher 
attrition  rate  during  the  training  process  than  the  majority  group 
members.    Since  the  Uniform  Guidelines  are  extended  to  cover  not  just 
selection  procedures,  but  also  employment  decisions  including  promo- 
tion, referral,  retention,  and  transfer,  the  user  may  find  that  at  some 
point  after  selection  some  other  employment  decision  demonstrates 
adverse  Impact.    If  the  Equal  Probability  model  of  test  fairness  is 
used,  this  problem  may  be  avoided,  but  unless  the  regression  lines  for 
the  minority  and  majority  groups  have  the  same  slopes,  its  use  could 
result  in  the  disproportional  selection  of  one  group  or  the  other.  The 
Conditional  Probability  model  could  be  us«l  to  Insure  that  appropriate 
numbers  of  successful  individuals  are  selected,  but  its  use  too  could 
result  in  an  inequitable  selection  ratio. 

The  test  user  is  in  a  dilemma,  as  current  definitions  and 
practices  stand.    In  order  to  meet  the  definition  of  fairness  at  the 
point  of  selection,  the  Constant  Ratio  model  may  be  employed,  but  use 
of  this  model  may  result  in  adverse  impact  and  unfairness  at  sonie 
later  employment  point.    The  acceptability  of  the  various  fairness 
decision  models  will  no  doubt  be  determined  by  the  courts.    In  the 
ideal  case,  in  which  the  minority  and  majority  samples  are  selected 
from  the  same  population  and  their  regression  lines  are  identical,  all 
three  models  will  agree,  as  they  did  in  the  present  study.    If  the 
test  user  is  in  the  unpleasant  situation  in  which  the  models  would 
lead  to  conflicting  conclusions  about  test  fairness,  then  some  correc- 
tive action  must  be  taken.    If  the  Equal  Probability  model  Indicates 
test  fairness,  but  the  CR  and  CP  do  not,  then  an  unfair  proportion  of 
successful  minorities  are  being  rejected,  and  a  lower  cut  score  may  be 
justifiable.    This  will  occur  when  the  predictor  criterion  correlation 


is  higher  for  the  minorities  than  for  the  majority.    If  the 
Conditional  Probability  model  Indicates  test  fairness,  but  the  EP  and 
CR  do  not,  then  the  predictor/criterion  correlation  Is  lower  for  the 
minority  than  for  the  majority,  and  resolution  of  this  problem  may 
require  either  development  of  new  selection  procedures  or  recruitment 
of  a  minority  applicant  population  that  more  closely  resembles  the 
majority  sample. 

If  the  use  of  different  cut  scores  Is  not  feasible,  or  If  the 
data  Indicate  that  the  minority  applicants  differ  from  the  majority 
applicants  In  how  well  their  performance  can  be  predicted,  the  test 
user  could  examine  recruitment  practices  to  see  If  efforts  could  be 
made  to  recruit  minority  applicants  who  are  more  like  the  majority 
applicants  In  terms  of  characteristics  related  to  the  probability  of 
success.    The  most  recent  version  of  the  Uniform  Guidelines  emphasizes 
the  role  of  recruitment  and  Its  effect  on  fairness.    This  emphasis  on 
recruitment  Indicates  that  the  effects  of  recruitment  practices  on 
selection  and  other  employment  decisions  will  be  a  part  of  the 
evaluation  of  the  fairness  of  a  selection  po-ocedure.    Modification  of 
minority  recruitment  practices  could  be  an  effective  means  of 
bringing  existing  selection  procedures  inter  compliance  with  the 
Uniform  Guidelines  without  necessitating  the  development  of  new 
selection  devices. 
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I.  Introduction, 

In  laboratory  research  designed  for  eventual  application  to  work 
settings,  frequently  the  purpose  is  to  be  able  to  generalize 
performance  of  one  population  (say,  college  students  or  aviation 
cadets)  on  a  complex  laboratory  task  to  a  population  that  Is  highly 
selected  for  ability  and  motivation,  e.g.,  airline  pilots  or  air 
traffic  controllers.    When  the  tasks  under  consideration  are  complex, 
there  Is  frequently  a  training  phase  of  the  study  during  which  the 
subjects  are  familiarized  with  the  tasks.    If  the  aim  of  the  research 
Is  to  generalize  to  a  population  that  is  both  highly  skilled  and 
motivated.  It  is  often  appropriate  to  select  subjects  during  this 
training  phase  who  can  perform  the  test  tasks  at  some  minimum  level  of 
competence  and  who  exhibit  sufficient  motivation  to  maintain 
consistently  acceptable  performance.    This  is  especially  important  in 
this  type  of  research  because  data  collection  is  often  very  time 
consuming  and  costly,  and  practical  considerations  limit  the  sample 
size.    An  incompetent  or  unreliable  subject  can  dramatically  affect 
the  accuracy  of  the  results  of  such  studies  and,  therefore,  the 
appropriateness  for  applying  research  outcomes  to  the  target  popula- 
tion.   An  incompetent  subject  may  be  identified  by  specifying  a 
minimum  level  of  performance  in  the  training  phase  of  a  study. 
However,  especially  in  cases  where  repeated  measure  designs  are 
employed  with  a  small  number  of  subjects,  it  would  also  be  desirable 
to  identify  subjects  who  exhibit  low  reliability  during  training  in 
order  to  eliminate  such  subjects  from  further  training  and  testing. 
In  such  cases,  grossly  unreliable  performance  may  be  reasonably  inter- 
preted to  indicate  inadequate  motivation  or  ability  on  the  part  of  a 
subject.  ^That  is,  a  subject  who  attends  to  the  task  and  performs 
adequately  part  of  the  time  and  at  other  times  virtually  Ignores  the 
task  and  performs  at  very  poor  levels  will  have  corresponding  varia- 
tions in  the  task  performance  measure.    Such  variability  of  perform- 
ance would  not  be  likely  (or  acceptable)  in  the  "real  life"  situations 
that  are  the  ultimate  concern  of  such  research.    If,  for  example,  the 
researcher  is  generalizing  to  pilot  performance,  a  pilot  who  was 
occasionally  uninterested  in  the  accuracy  of  his  landing  approach 
would  be  rapidly  eliminated  from  the  population  of  pilots,  if  not  the 
population  of  the  living.    Thus,  the  elimination  of  subjects  who 
clearly  are  able  to  perform  adequately  but  who  are  unwilling  or  unable 
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to  maintain  acceptable  levels  of  performance  may  be  an  important 
factor  in  the  generalizability  of  research  findings • 

In  research  designs  where  multiple  measures  of  the  same  variable 
are  made  on  the  same  subject  (repeated  measures),  reliability  of  the 
measure  is  frequently  estimated  through  the  use  of  analysis  of 
variance  (l^A-).    The  intent  of  such  an  estimate  is  to  assess  the 
stability  of  the  test  or  to  define  homogeneous  subsets  of  test  items. 
The  present  study  develops  a  method  that  may  be  used  to  estimate  the 
reliability  of  an  individual  subject's  performance  across  successive 
administrations  of  the  same  task  or  parallel  versions  of  the  same  test 
and  identify  subjects  with  extremely  low  reliabilities.  Identification 
of  such  subjects  is  particularly  useful  when  the  sample  size  is  small 
and  an  unreliable  subject  can  significantly  affect  the  validity  of  the 
research  results. 

II.  Method. 

If,  in  a  subjects-by-measures  data  matrix,  all  within-measure 
variances  are  equal,  then  the  average  correlation  (including  the 
diagonal)  (R)  among  the  measures  is  equal  to  the  sum  of  squares  for 
subjects  (SSs)  divided  by  the  quantity,  total  sum  of  squares  (SSt) 
minus  sum  of  squares  between  measures  (SSa; 

R  =  SSs/ (SSt  -  SSa). 

If  within-measure  variances  are  unequal,  then  R  in  the  above  expres- 
sion is  a  function  of  the  sum  of  the  covariance  matrix  rather  than  the 
average  correlation. 

This  average  correlation  among  measures  (R)  is  an  estimate  of 
reliability  of  the  measures,  if  they  are  parallel  (6,  p.  61). 
Parallel  measures  are  distinct  measurements  that  measure  the  same 
thing  on  the  same  scale  (6,  p.  ^8).    Therefore,  the  intercorrelations 
of  parallel  measures  should  be  equal  and  are  the  upper  bound  on 
correlations  with  other  tests  (6,  p.  59). 

Since  the  purpose  of  this  analysis  is  to  derive  an  index  of 
subject  reliability  rather  than  measure  differences,  all  measures  must 
be  standardized  within  administrations.    This  has  the  effect  of 
equalizing  the  within-measure  variances  and  results  in  reducing  the 
sum  of  squares  for  measures  (SSa)  zero. 

SSsubj 

Since  SSa  =  0,  R  =  SStotal-    SStotal  is  equal  to  the  sum  of 
SSsubj >  and  the  error  term  SSws  (sum  of  squares  within  subjects). 
SSffs  Is  the  sum  of  the  squared  deviations  of  test  scores  around  the 
individual  subject's  mean  test  score,  which  is  equal  to  the  sum  of 
squares  for  the  sub jects--by- measures  interaction. 

SStotal  =  SSsubj  +  SSws  =  SSsubj  +  SSsubj  ^  ^ 
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R,  which  Is  used  as  an  estimate  of  reliability,  can  then  be 
defined  as  an  Inverse  function  of  the  wlthln-subJect  variance. 

R  =  1  -  SSws/SSt 

The  Wlthln-subJect  variance  may  be  calculated  for  any  subject  or  group 
of  subjects  and  subsequently  used  as  an  Index  of  reliability  for  that 
subject  or  group  of  subjects. 

In  order  to  test  the  reliability  of  a  given  subject  against  the 
overall  level  of  reliability,  the  within-subject  variance  for  a  given 
sii)Ject  (Vi)  may  be  co:npared  with  the  within-subject  variance 
associated  with  scores  from  the  remainder  of  the  siJbJects  (V-i),  Since 
these  two  variances  are  independent  if  all  subjects  are  independent, 
they  may  be  compared  by  use  of  an  F  ratio.    A  significant  Vi/V«i  would 
indicate  that  subject  1  was  significantly  less  reliable  at  the  specific 
a  level  than  the  rest  of  the  subject  sample. 

The  calculatlonal  procedure  for  these  tests  is  as  follows. 
Assune  a  data  matrix  Xij  with  1  =  1  to  N  subjects  and  J  =  1  to  M 
measures.    These  measures  might  reasonably  be  repeated  measures  on  the 
same  task  or  measures  from  parallel  forms  of  the  same  task.  The 
scores  in  the  data  matrix  would  first  be  standardized  so  that  all 
column  (measure)  means  and  variances  are  equal. 

Let  Vi  equal  the  within-subject  variance  of  subject  1. 

SSwlthln  1  =  2X2ij  -  (i:Xij)2/M     (M  =  number  of  measures) 


d^'wlthln  1  =  M  -  1  so, 
Vi  =  SSwlthln  i/dl'within  1 
Let  V_i  equal  the  withln-subject  variance  of  all  subjects  except  1. 

SS.i  =  SSwlthln  subj  -  SSwlthln  1 

=  SStotal  -  SSsubJ  -  SSwlthln  1 

r 

df_i  =  dfwithln  subj  "  dfwlthln  1 

=  .(M-l)(N-2)  (N  =  number  of  subjects) 

V_i    =  SS_i/df_i 

Since  Vi  and  V_i  are  Independent  variances  If  all  subjects  are 
Independent,  the  ratio  between  them  Is  distributed  as  F,  with  (M-1) 
and  (N-2)(M-1)  degrees  of  freedom.    A  significant  Vi/V-i  indicates 
that  subject  x  is  less  reliable  in  his  performance  than  the  other 
subjects. 
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A  problem  In  the  application  of  this  method  Is  that  It ^Involves 
multiple  tests,  i.e.,  each  subject  Is  tested  separately  for 
reliability.    In  experimental  situations  where  multiple  comparisons 
are  made,  the  Type  I  error  rate  (alpha)  Is  much  higher  than  the  alpha 
level  chosen  for  the  individual  tests.    A  straightforward  solution  to 
this  problem  is  to  use  a  smaller  alpha  value,  which  takes  into  account 
the  number  of  comparisons.    A  simple  formula  (8)  for  the  determination 
of  alpha  resulting  from  multiple  comparisons  Is:    alphae  =  1  -  (1  - 
alpha )c  where  alphae  Is  the  error  rate  per  experiment,  alpha  Is  the 
error  rate  per  comparison  and  c  Is  the  number  of  Independent  compari- 
sons.   Although  the  comparisons  made  In  the  present  study  are  not 
Independent,  this  approach  will  Identify  subjects  who  are  extreme.  A 
table  of  critical  values  for  alphae  may  be  found  In  Jacobs  (5). 

In  some  situations,  the  experimenter  may  want  to  estimate  the 
effect  on  R  of  deletion  of  certain  subjects.    This  procedure  is  not 
readily  amenable  to  significance  testing  but  may  be  used  to  get  a 
"feel"  for  the  data. 

R.1  =  an  estimate  of  the  average  correlation  that  would  result  If 
subject  1  were  removed  (assuming  that  for  all  measures,  mean  =  0  and 
s.d.  :=  1)« 

R.1  =  (SS.i  -(ZXij)2/MN/(SStotal  -  (N/(N-l)ZXf 
J  J 

A  comparison  of  R  and  R^x       "  R-x)  "lay  be  used  to  provide  an  Index  of 
the  effect  on  overall  reliability  of  a  given  subject's  scores. 

III.  Discussion. 

The  method  presented  here  provides  researchers  with  a  tool  that 
may  be  used  to  Identify  subjects  whose  performance  on  repeated 
measures  or  parallel  measures  Is  unusually  Inconsistent.    The  procedure 
can  be  used  for  preselection  of  subjects  for  experimental  studies  In 
human  factors  research  In  which  practical  considerations  dictate 
small  sample  sizes. 

The  "prediction  of  predictability"  Is  a  problem  that  has  long 
plagued  researchers  (2,3,7).    Using  a  subject  reliability  Index  as  a 
predictability  measure  Is  a  concept  that  has  not  been  applied.  Of 
course,  research  utilizing  this  method  is  needed  to  determine  its 
potential  usefulness. 
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A  Comparison  of  O^o  Criterion-Referenced  Scoring  Procedur^ 
An  Answer-Until-Correct,  Multiple-Choice  Performance  Test 


by 

John  B.  Meredith,  Jr.,  Ph.D 
J.  Thomas  Martin,  Jr. 
Data-Design  Laboratories 
Norfolk,  Virginia  23502 
November  1978 


In  many  testing  programs  it  is  desirable  to  assess  the  status 
of  the  examinee  with  respect  to  a  Performance  standard  or 
criterion.     Criterion-referenced  testing  (CRT)  ^jn  serve  as 
a  vehicle  for  such  an  assessment.     The  purpose  of  this  report 
L  ?o  prLent  the  results  of  a  comparison  of  two  CRT  methods 
applied  to  a  paper  and  pencil  simulated  performance  test  known 
as  the  Decision  Measurement  System  (DMS) . 

The  DMS  uses  a  multiple-choice,  answer-until-correct  procedure 
which  leads  the  examinee  through  a  series  of  questions  in 

r"^.:™  4--  "^v^i^i -shoot"  a  fault  within  the  equipment  using 
;ic?o;ial''re;re;;ntations  of  panel  indications.     Each  examinee 
mark.?  (swipes)  his  response  on  a  latent  image  answer  sheet.  M 

the  ansSer  is  correct  he  is  directed  to  the  next  question;  ^ 
If  Tis  Tnller  is  incorrect  he  is  allowed  to  make  another  swipe 
and  continues  until  he  has  chosen  the  correct  answer. 

TWO  CRT  methods  were  examined  to  classify  J^^^ jjj^f  ^^-^^^^^the 
fail  categories.     The  first  was  the  present  method  used  toy  tne 
Navy.     This  method  invoked  a  predetermined  passing  score  of 
62  5  for  the  DMS  test  scores,  where  an  examinee  s  score,  is 
deiermined  by  exponentially  combining  the  number  of  items 
answered  correctly  by  him  on  the  first,  second,  and  third  swipes. 

The  second  method  involved  an  extension  of  the  minimally  Accep- 
table Performance  Level  (MAPL)  technique,  ^^^roduced  by  Nedelsky 
(1954)  and  modified  by  Meredith  (1976),  to  set  a  passing  score 
based  on  the  sum  of  the  expected  number  of  swipes  required  by 
tSe  Minimally  Qualified  Examinee  (MQE)  to  ^^^JPlj^J.  ^JJ^^^^^^ 
on  the  DMS.     The  expected  number  of  swipes  for  each  item  was 
determined  from  subject  matter  expert  evaluations  concerning 
the  attractiveness  of  item  alternatives  to  the  MQE. 

Each  method  was  applied  to  the  DMS  results  obtained  from  30 
examinees,  who  had  been  administered  the  Sonar  Sounding  Set 
DMS  during  January  and  February  1978. 

These  CRT  methods  were  evaluated  using  two  p   )cedures.  The 
first  procedure  was  to  determine  the  re:|ri^bility  of  each  clas- 
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sification  method.     The  second  procedure  was  to  validate  the 
CRT  classifications  of  the  examinees  by  determining  the  corre- 
lation of  the  pass/fail  classifications  with  four  proficiency 
indicators  (average  knowledge  test  scores #  average  skill  test 
scores,  paygrade,  and  number  of  patrols)- 

Por  each  CRT  method ,  examinees  who  were  classified  as  meeting 
or  exceeding  the  minimum  passing  score  were  assigned  a  score 
of  one;  those  classified  as  not  meeting  the  minimiam  passing 
score  were  assigned  a  score  of  zero.     These  dichtomous  scores 
were  used  to  determine  the  reliability  and  concurrent  validity 
of  both  CRT  methods. 

The  reliability  of  each  classification  method  was  determined 
by  randomly  splitting  the  DMS  into  two  parallel  sections  and 
establishing r  for  both  methods ,  a  passing  score  on  each  sec- 
t:io]i«    Nextr  the  proportion  of  consistent  classifications 
across  test  sections  was  determined  for  both  classification 
methods  as  an  indication  of  their  reliabilities. 

The  reliability  of  the  present  Navy  passing  score  classification 
method  was  .38  while  the  reliability  of  the  MAPL  classification 
method  was  .64.     (Note  that  these  are  conservative  estimates  of 
.the  classification  reliabilities ,  since  the  classifications 
were  based  on  half  the  number  of  original  test  items.)  This 
difference  in  the  reliability  of  the  two  classification  methods 
was  expected  since  the  MAPL  technique  adapts  to  the  difficulty 
of  the  performance  test  by  setting,  a  lower  passing  score  (based 
on  a  greater  number  of  swipes)  on  more  difficult  tests.  The 
inflexibility  of  the  62.5  criterion  in  contrast  does  not  account 
for  tests  of  greater  or  lesser  difficulties. 

Both  classification  methods  yielded  approximately  equal  cor- 
relation coefficients  with  the  four  proficiency  level  indica- 
tion.    Table  1  gives  the  Pearson  product'-moment  correlation 
coefficients  for  each  method  with  the  four  proficiency  level 
indicators.     Neither  classification  method  resulted  in  signifi- 
cantly higher  correlation  coefficients  for  any  of  the  four 
proficiency  level  indicators. 

The  criterion- referenced  MAPL  technique  was  found  to  be  the 
more  efficient  means  for  classifying  exaiainees.    Also^  both 
classification  methods  were  found  to  be  equally  valid  when 
compared  to  four  proficiency  criteria.     These  results^  however, 
were  based  on  only  30  examinees,  and  before  any  sweeping 
generalization  can  be  made,  it  is  suggested  that  this  method- 
ology be  applied  to  larger  set  of  data.     Further,  the  MAPL 
aiethod  for  evaluating  criterion-referenced  performance  tests 
8|ipuld  be  compared  to  other  CRT  procedures  #  bptA.  empixically 
^^apd'^r practical ly.>-.      .    ...  . 


ERIC 


Proficiency  Level 
Indicators 


Average  PTEP* 
Knowledge  Score 

Average  PTEP* 
Skill  Score 


Present  Navy 
Classification  Method 


MAPL  Classificatior 
Method 


Paygrade 

Number  of  Patrols 


•PTEP:    Personnel  and  Training  Evaluation  Program  ■  ■ 
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An  analysis  of  the  OE  concept  and  suggested  Improvements 
C.E.  George,  Henry  Klnnlson  and  H.W.  Smith 
Texas  Tech  University 

We  present  some  observations  of  the  Army  Organizational  Evaluation 
(OE)  program  (USACGSC,  1978).    Our  concern  Is  that  the  General  Organ- 
izational Questionnaire  (GOQ,  Appendix  A  of  TRADOC-OETC,  1974)  seems 
to  address  only  garrison  effectiveness.    If  this  is  the  case,  is  it 
possible  that  a  tactical  unit  might  become  more  effective  as  a  garri- 
son organization  but  lose  some  potential  combat  effectiveness  as  a  re- 
sult of  an  OE  program? 

A  model  of  unit  effectiveness  presented  earlier  (George,  1977; 
George  and  Smith,  1978)  suggests  that  this  may  be  a  real  possibility 
(Figure  1).    This  model,  based  on  experimental  work  with  Infantry  units, 
indicates  that  the  GOQ  factors  are  weak  and  uncertain,  even  potentially 
misleading,  predictors  of  small  unit  tactical  proficiency.    The  GOQ 
factors,  communication  flow,  decision  making,  motivation,  integration 
of  personnel  with  unit  and  identification  with  unit  are  essentially 
"symptomatic"  variables  rather  that    direct  determinants  of  tactical 
proficiency  in  small  Infantry  units  (Figures  2  and  3). 

It  is  recognized,  or  course,  that  the  services  must  produce  troop 
satisfaction  and  motivation  as  measured  by  the  GOQ  factois.    The  point 
is  that  one  may  do  this  through  organizational  climate  (higher  level 
leadership)  in  ways  that  may  fail  to  affect,  or  even  degrade,  tactical 
performance.    On  the  other  hand,  it  is  suggested  that  these  ends  can 
be.'^fi"  tactical  problems  via  teamwork  training. 
Thg  Army  OE  program 

This  is  a  voluntary  program,  confidentiality  is  promised  and  the 
anonymity  of  respondents  is  respected.    The  OE  process  consists  of 
four  steps:    1)  assessment,  2)  planning,  3)  implementation,  and  4)  eval- 
uation/follow-up.   A  central  component  of  the  assessment  step  is  the 
84  (plus  several  demographic)  item  GOQ.    This  questionnaire  surveys  a 
standard  upon  which  to  base  the  later  steps.    The  items  are  generally 
easy  to  read  and  are  unambiguous.    They  are  written  to  fit  any  type  ot 
organizational  setting,  that  is,  they  ask  about  co-workers  and  super- 
visors rather  than  NCO's,  officers  and  peer-group  soldiers.    This  gen- 
erality provides  some  gain  in  adaptability  but  it  probably  also  pro- 
duces some  feeling  among  combat  branch,  company  and  battalion  level 
commanders  that  it  is  too  general  to  fit  their  specific  organizational 

concerns .  _ 

Although  a  commander  is  encouraged  to  work  with  the  OE  officer  to 
add  items  to  the  GOQ,  this  is  a  laborious  and  uncertain  procedure  which 
may  fail  to  produce  interpretable  data.    It  is  suggested  that  a  subset 
of  branch  specific  items  be  developed  and  factor  analyzed,  along  with 
«-he  current  items,  on  representative  samples  of  soldiers  in  tactical 
mits.    At  worst,  this  would  add  to  the  face  validity  of  OE  procedures 
for  unit  commanders.    At  best,  it  might  produce  a  more  sensitive  sur- 
vey instrument.     It  is  our  feeling,  however,  that  some  way  must  be 
found  to  measure  teamwork  directly  in  the  small  tactical  unit  and  to 
include  this  evaluation  as  a  component  of  the  present  OE  program. 
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A  specific  concern  we  have  about  the  GOQ  is  the  liiq)liclt  idea  of 
the  soldier  as  a  passive  recipient  of  information  from  above  or  a  pro- 
vider of  information  upon  request.    The  basic  requirement  for  develop- 
ing teanwork  in  the  lower  level  tactical  unit,  especially  in  the  case 
of  Infantry,  is  an  active  information  seeking  soldier  who  recognizes 
and  meets  the  needs  of  others  for  data  in  fluid,  confusing  situations. 
It  may  seem  unfair  to  criticize  the  GOQ  for  not  doing  something  it  was 
not  designed  to  do.    On  the  other  hand,  if  these  units  are  as  different 
from  organizations  in  general  as  we  think  they  are,  it  may  be  vital  to 
consider  the  possibility  that  OE  users  could  be  led  to  confuse  garri- 
son with  operational  effectiveness. 

The  General  Organizational  Questionnaire  interpretive  package 

Computer  printouts  of  GOQ  results  provide  the  user  with  highly 
interpre table  data.    Especially  valuable  are  the  breakdowns  across  demo- 
graphic variables  and  by  subunits  of  the  unit  being  evaluated.    On  the 
negative  side,  the  OE  officer  and  user  may  be  led  to  overinterpret  the 
differences  among  subunits.    Differences  are  said  to  be  "moderately  sig- 
nificant" at  the^jap  level.    Users  are  warned  against  overinterpreting 
differences  between  medians  based  on  small  numbers  of  cases,  but  appar- 
ently not  warned  to  take  into  account  the  total  nunber  of  statistical 
comparisons  being  made. 
Summary 

The  Army  OE  program  in  general  and  the  GOQ  specifically,  have  many 
strengths.    Wider  usage  by  lower  level,  combat  branch  coyrjnanders  will 
probably  require  more  believable  safeguards  re:     confidentiality,  better 
face  (and  hopefully  construct)  validity  and  perhaps  further  refinement 
of  interpretive  guidelines.    Normative  data  from  similar  units  in  simi- 
lar circumetances  could  also  be  most  helpful. 
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Figure  1.    A  model  of  small  unit  functioning* 
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Figure  2.    Model  of  small  unit  structural  characteristics- 
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I.    Symptomatic  variables  (individual  and  group  characteristics  within  the 
unit-task-setting  environment) 

A.  Sodometric  (questionable  administrative  utility) 

1.  affection  (stress  resistence) 

2.  respect  (mutual  confidence) 

B.  ttiit  member  motivation  to  maximize: 

1.  personal  achievement  (intragroup  competitive) 

2.  socializing  (emotional  support) 

3.  unit  efficiency  (coordination) 

II.    Behavioral  coordination  of  response 

A.  Shared  attention  among: 

1.  one's  primary  job 

2.  status  of  co-workers 

3.  machine (s)  in  the  unit  system 

4.  extra-unit  task  environment 

B.  Recognition  of  initiative  taking  requirement 

C.  Respond  to  requirement 

1.  individual,  immediate  action 

2.  commuiiicaLe  status  to  other(s) 


Figure  3.    Small  unit  level  correlates  of  performance. 
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SECTION  11 
TESTING:    Techniques  and  Technologies 
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The  Development  of  a  Technique  for  Using  Occupational 
Survey  Data  to  Construct  and  Weight  Computer-Derived  Test  Outlines 
for  Air  Force  Specialty  Knowledge  Tests  (SKTs) 

by 

William  J.  Phalen 

Air  Force  Human  Resources  Laboratory 
Brooks  AFB,  Texas 


The  opinions  and  conclusions  expressed  in  this  paper 
are  those  of  the  author  and  are  not  necessarily 
those  of  the  United  States  Air  Force. 


Introduction 

In  June  1974,  the  Government  Equal  Employment  Opportunity  Coordi- 
nating Council  (EEOCC)  issued  a  slender  but  highly  significant  publica- 
tion entitled  Uniform  Guidelines  on  Employee  Selection  Procedures.  Its 
purpose  was  to  spell  out  in  some  detail  the  intent  of  several  acts  of 
Congress  and  executive  orders  concerning  the  need  to  validate  tests 
used  for  personnel  selection.    The  Government  specifically  urged  that 
such  tests  be  validated  against  "a  systematic  and  appropriately  compre- 
hensive analysis  of  the  job  for  which  the  selection  procedure  is  to  be 
used."    The  Commander  of  the  USAF  Occupational  Measurement  Center  at 
that  time.  Col  Kaapke,  directed  that  a  project  be  initiated  to  establish 
procedures  for  making  systematic,  efficient,  and  timely  use  of  job 
analysis  data  in  the  construction  of  Specialty  Knowledge  Tests  (SKTs). 
This  was  to  ensure  that  SKTs  would  be  constructed  in  accordance  with 
the  proposals  of  the  Uniform  Guidelines.    I  was  selected  as  project 
officer  and  began  working  on  the  project  in  September  1975. 

Initial  Assessment  of  the  Problem 

There  had  been  numerous  previous  attempts  to  integrate  occupational 
survey  data  into  the  test  development  process,  none  of  which  had  met 
with  much  success.    Early  attempts  had  centered  on  the  psychologist  and 
his  team  of  subject-matter  specialists  poring  over  the  bound  volumes  of 
computer  printout  material  that  accompanied  the  final  report  of  a  job 
analysis.    In  most  cases,  the  test  psychologist  lacked  the  expertise  to 
read  the  printouts  and  locate  relevant  information.    Even  when  the  test 
psychologist  was  knowledgeable,  the  printouts  themselves  were  not  in  a 
format  that  would  be  amenable  to  direct  use  in  test  outline  development. 
Added  to  this,  there  were  severe  time  constraints  levied  on  the  various 
phases  of  the  test  development  process,  with  the  inevitable  result  that 
the  survey  data  printouts  were  laid  aside  early  in  the  project  without 
having  made  any  signficiant  contribution  to  test  outline  development. 
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Later  attempts  at  making  survey  data  useful  for  test  outline  development 
involved  the  preparation  of  a  much  smaller  package  of  computer  printouts 
which  included  job  descriptions  for  SKT-relevant  paygrade  groupings. 
Oftentimes  the  occupational  analyst  responsible  for  the  specialty  would 
meet  with  the  SKT  team  and  explain  how  to  read  the  printouts  and  how 
the  data  might  be  applied  to  the  test  construction  process.    While  this 
procedure  engendered  greater  use,  the  results  still  left  much  to  be 
desired.    The  data  package  was  still  too  large  and  complex  and  no 
systematic  way  was  devised  to  make  the  data  an  integral  part  of  the 
test  outline  development  process.    At  best,  the  team  would  use  the 
survey  data  to  confirm  decisions  already  ijiade  or  to  resolve  debates 
concerning  the  degree  to  which  certain  tasks  were  performed  in  a  special- 
ty.   But  this  was  done  after  the  test  outline  had  already  been  developed 
and  percentage  weights  had  been  assigned  to  the  various  content  areas 
independently  of  the  survey  data. 

If  significant  progress  was  to  be  made  in  the  use  of  survey  data 
in  test  dsvclopmcnt,  it  appeared  that  the  main  questions  to  be  auufcSS8u 
were:    Is  occupational  survey  data  relevant  to  the  development  of  valid 
SKTs?    If  so,  how  can  its  relevance  and  usefulness  be  maximized? 

The  first  question  was  easy  to  e.nswer.    Survey  data  had  much  to 
offer  in  the  way  of  validating  the  content  of  SKTs  in  terms  of  job 
relevance.    Because  survey  data  are  gathered  on  hundreds,  or  even 
thousands,  of  job  incumbents,  they  provide  a  more  representative  and 
reliable  sampling  of  task  performance  than  could  possibly  be  obtained 
from  the  judgments  of  three  or  four  subject-matter  specialists,  no 
matter  how  broadly  experienced  they  were.    Answering  the  second  question 
on  how  to  maximize  the  relevance  and  usefulness  of  occupational  survey 
data  in  test  development  is  the  topic  that  will  concern  the  remainder 
of  this  paper.    Relevance  and  usefulness  actually  subsume  many  other 
questiojris  such  as:    Should  all  tasks  in  a  job  inventory  be  considered 
for  use  on  an  SKT,  or  is  there  only  ia  relatively  small  subsiet  oT tasks 
in  each  survey  that  would  be  relevant  and  useful?   What  might  be  meaning- 
ful criteria  for  selecting  relevant  and  useful  tasks?   Can  a  valid 
criterion  be  developed  that  would  permit  the  direct  evaluation  of  tasks 
on  testing  importance?   Can  survey  data  be  used  to  determine  not  only 
test  outline  content,  but  also  the  percentage  weights  for  outline 
areas?   What  would  be  the  most  useful  format  for  presenting  tasks  for 
test  outline  and  test  item  development?   Let  me  now  address  these 
questions  one  at  a  time. 

All  Tasks  vs.  Subset  of  Tasks 

Careful  examination  of  SKT  requirements  and  survey  limitations 
revealed  types  of  tasks  which  could  be  eliminated  from  consideration. 
Eight  categories  of  task  unusability  for  SKT  purposes  were  identified. 
These,  combined  with  a  ninth  "usable"  category,  were  ordered  to  form  a 
nine-point  pseudo-scale  that  could  be  used  by  subject-matter  experts  to 
classify  all  t  in  a  job  inventory  in  terms  of  usability.  The 

task  usability  scale  is  shown  in  Figure  1. 


Figure  1.    Instructions  for  Coding  Usability 
of  Tasks  for  SKT  Purposes 


RECORDIHG  TASK  USABILITY  FOR  SKT  PURPOSES 

Rate  each  task  In  the  "Time  Spent"  column  or  right-hand  margin  with 
one  of  the  following  codes  (use  the  lowest  number  If  more  than  one 
code  applies;  e.g.,  If  codes  1,  4,  and  5  apply,  record  "1"): 

Code  Meaning 

1  Task  Is  totally  Inapplicable  to  this  AFSC/shredout  (If  task 

Is  even  slightly  applicable,  use  code  8) 

2  Task  is  obsolete  or  will  soon  be  obsolete 

3  Task  statement  doesn't  make  sense  or  is  uninterpretable 


4   Task  to  a  large  extent  duplicates  another  task  (give  duty  and 
task  identifier  of  duplicate  task;  e.g.,  B  32) 

3    Ta&k  CdiiMUL  ue  ie^lcu  u/  ijapei'dnu'pcricn  test 

6  Task  applies  to  PFE,  USAF  9-Sk111  Level  Upgrade  Exam,  or  USAF 

Supervisory  Exam 

7  No  SKT-usable  reference  covers  this  task  (usually  determined 

when  attempting  to  write  test  item) 

8  Task  is  not  Important  enough  to  be  tested  on  (e.g.,  very  few 

airmen  perform  it,  or  it  is  extremely  easy  to  learn,  etc.) 

9  Task  is  important  enough  for  testing  on  at  least  one  level  of 

the  AKT/SKT 

B.  If  a  task  statement  requires  revision,  record  the  STS  aree(s)  and 
task  usability  code  for  the  task  statement  as  it  Is  currently 
worded,  and  then  pencil  in  the  necessary  revision. 

C.  If  additional  tasks  need  to  be  included  in  the  Job  Inventory,  write 
in  the  tasks  on  the  pages  provided  at  the  back  of  the  Inventory 
booklet,  preceded  by  the  appropriate  duty  identifier  (e.g..  A,  B, 
C,  etc.),  and  record  the  applicable  STS  areas  and  task  usability 
codes  for  these  tasks  as  described  above. 


The  first  four  categories  of  unusability  (1  through  4)  are  attributable 
to  problems  in  the  survey  instrument.    The  next  four  categories  of 
unusability  (5  through  8)  arise  from  special  requirements  of  SKTs.  The 
last  category  (9)  is  the  only  one  which  states  that  a  task  is  usable. 
A  team  of  subject-matter  experts  was  asked  to  rate  every  task  in  the 
applicable  job  inventory  on  task  usability  for  SKT  purposes.    Only  one 
rating  was  to  be  given  to  each  task:    namely,  the  lowest- numbered 
rating  that  applied,  the  reason  being  that  the  lower  the  number,  the 
more  unusable  the  task.    This  procedure  also  insured  that  all  lower- 
numbered  categories  than  the  one  assigned  to  the  task  did  not  apply. 
The  rating  given  was  to  be  based  on  a  consensus  of  the  subject-matter 
experts.    Separate  individual  ratings  were  not  permitted,  because 
averaging  would  be  inappropriate  for  a  pseudo-scale  such  as  this  one. 
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In  general,  the  task  usability  ratings  were  provided  by  the  SKT  minor^ 
revision  team,  so  that  the  codes  would  be  available  for  input  to  the 
computer  in  selecting  tasks  for  a  computerized  test  outline  to  be 
prepared  for  the  SKT  major^  revision  project  the  following  year. 

Additional  requirements  were  also  established  for  the  selection  of 
usable  tasks.    These  requirements  were  based  on  task  data  parameters. 
To  be  selected  as  usable,  a  task  had  to  be  performed  by  at  least  20%  of 
one  of  the  three  groups  representing  the  three  AKT/SKT  testing  levels: 
E-2/E-3  (Apprentice  Knowledge  Test),  E-5,  and  E-6/E-7.    Tasks  with 
lower  percentages  were  excluded  as  unusable,  as  were  tasks  performed  by 
a  higher  percentage  of  supervisory  personnel  C9-skill  level)  than 
journeyman  personnel  (5-skill  level)  and  tasks  which  were  not  performed 
by  at  least  10%  of  job  incumbents  in  each  of  the  major  using  commands. 
The  additional  requirements  were  based  on  the  fact  that  SKTs  are  Air 
Force-wide  tests  that  include  only  speciality  knowledge  (no  general 
supervision)  and  should  cover  only  tasks  which  are  performed  by  a 
sigrriflwrit  percentage  of  rnernbers  acrOSS  the  Specialty  (not  specific  to 
a  major  command).    While  it  is  true  that  the  additional  requirement 
aimed  at  the  elimination  of  supervisory  tasks  overlaps  code  6  of  the 
usability  scale,  it  has  been  found  to  be  a  useful  backup  to  override 
coding  errors. 

The  task  filtering  processes  described  above  have  routinely  elim- 
inated from  SKT  consideration  anywhere  from  one-half  to  three-quarters 
of  the  tasks  contained  in  the  survey  instrument.    The  remaining  tasks 
have  proved  to  be  a  quite  manageable  subset  of  tasks  with  strong  claims 
to  testing  importance.    Once  the  subset  of  usable  tasks  was  identified, 
the  selected  tasks  were  assigned  to  one  or  more  of  the  three  AKT/SKT 
testing  levels.   A  task  had  to  be  performed  by  at  least  20%  of  the 
incumbents  at  any  one  level  to  be  included  as  an  appropriate  task  for 
testing  at  that  level.    Typically,  one-fifth  to  one-third  of  the  tasks 
would  be  assigned  to  only  one  level.    The  remaining  two-thirds  to  four- 
fifths  would  be  assigned  to  more  than  one  level.    So  far,  the  number  of 
usable  tasks  assigned  per  testing  level  has  been  between  17  and  142 
tasks,  depending  to  a  large  extent  on  the  total  number  of  tasks  in  the 
job  inventory  and  the  homogeneity  of  the  specialty. 

The  Criterion  Problem 

As  anticipated,  the  development  of  an  adequate  criterion  to  assess 
the  testing  importance  of  tasks  proved  to  be  the  most  difficult  problem 
of  all.    The  problem  was  compounded  by  the  fact  that  the  Specialty 
Knowledge  Test  Development  Branch  did  not  possess  the  resources  for 


^An  SKT  major  revision  team  develops  new  test  outlines,  including  out- 
line area  weights,  and  performs  a  thorough  rewrite  of  the  tests.  A 
minor  revision  team  merely  updates  the  previous  year's  test. 
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gathering  and  processing  the  large  amount  of  data  that  would  be  required 
to  obtain  direct  assessments  of  task  testing  importance  from  the  more 
than  350  AFSCs  for  which  reasonably  current  occupational  survey  data 
were  ivail able.    On  the  other  hand,  obtaining  task  testing  importance 
ratings  from  the  3-  or  4-man  SKT  teams  at  the  beginning  of  a  test 
development  project  would  be  an  exercise  in  futility.    First  of  all, 
data  supplied  by  such  a  small  sample  would  lack  representativeness  and 
reliability,  which  were  the  major  problems  besetting  the  current  method 
of  test  outline  development.    Secondly,  the  data  could  not  be  processed 
quickly  enough  to  be  available  to  the  team  when  it  was  needed. 

A  different  avenue  which  showed  more  promise  was  to  use  task 
factor  data  already  being  gathered  by  the  Air  Force  Human  Resources 
Laboratory  In  support  of  training  priorities  research.    A  factor  Identi- 
fied as   field  recommended  task  training  emphasis"  appeared  to  be  a 
reasonably  close  analog  to  task  testing  importance— close  enough  that 
It  could  possibly  be  considered  as  a  substitute  for  it.    However,  the 
substitution  of  training  emphasis  for  testing  importance  had  several 
drawbacks.    First,  the  Importance  of  a  task  for  'inclusion  on  a  promotion 
test,  such  as  the  SKT,  may  be  high,  even  if  there  is  no  perceived  need 
for  training  in  the  task.    Secondly,  some  tasks  require  training  because 
they  are  job  specific  and  have,  therefore,  not  been  trained  in  the 
school  or  encountered  on  previous  jobs.    Such  tasks  would  be  Inappropriate 
in  an  SKT,  which  Is  required  to  test  broad  AFSC  knowledges,  thirdly, 
training  emphasis  applies  to  both  skills  and  knowledges;  whereas  the 
SKT  deals  only  with  the  knowledge  components  of  tasks.    These  drawbacks 
militated  against  the  direct  use  of  the  recommended  task  training 
emphasis  factor  as  the  criterion  of  task  testing  importanca.  However, 
the  six  principal  factors  used  by  the  Human  Resources  Laboratory  to 
predict  training  emphasis  seemed  to  encompass  all  the  elements  of 
testing  importance  as  defined  in  the  guiding 'documents  for  the  SKT 
program.    These  factors  were:    percent  of  members  performing  the  task, 
an  index  of  percent  time  spent  on  the  task  by  all  members,  task  learning 
difficulty,  probable  consequences  of  inadequate  performance  of  the 
task,  task  delay  tolerance,  and  average  grade  level  (by  averaging  the 
percent  of  members  in  each  grade  performing  the  task). 2   The  index  of 
percent  time  spent  was  so  highly  correlated  with  percent  members  per- 
forming (in  excess  of  .90  for  all  observed  specialties)  that  the  Index 
of  percent  time  spent  was  dropped  as  being  redundant.    The  average 
grade  level  factor  was  considered  important  as  a  criterion  for  placing 
tasks  at  the  appropriate  testing  levels,  but  not  as  a  predictor  of 
testing  importance  within  testing  levels.    The  remaining  four  factors 
became  the  basis  for  two  different  methods  of  constructing  a  criterion 
of  task  testing  importance  based  on  a  weighted  composite  of  the  four 


^Evidence  that  these  factors  were  relevant  to  the  development  of  SKT  out- 
lines was  presented  in  a  study  by  Vaughan  and  Hickerson  (reported  at  the 
1976  MTA  Conference)  in  which  SKT  test  outline  weights  were  reliably 
predicted  from  occupational  data  gathered  on  these  factors. 
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factors.   The  following  paragraph/ will  be  devoted  to  discussing  the 
two  methods. 

Development  of  a  Composite  Criterion  by  Policy  Capturing  with  Simulated 
Task  Data 

>    As  stated  previously,  gathering  criterion  data  on  task  testing 
importance  from  large  samples  of  incumbents  in  each  specialty  was  not 
feasible;  on  the  other  hand,  the  number  of  testing  impprtance  ratings 
that  could  be  supplied  for  the  tasks  in  each  Air  Force  specialty  by  the 
individual  SKT  test  development  teams  would  not  be  sufficient  to  insure 
reliability. 

One  way  of  surmounting  these  difficulties  and  obtaining  testing 
importance  weights  for  the  four  predictor  variables  (percent  performing, 
difficulty,  consequences,  and  delay  tolerance)  based  on  an  adequate 
number  of  raters  was  to  develop  a  non-AFSC- related  set  of  simulated 
tasks  for  which  randomly  generated  ratings  on  the  four  predictor  variables 
would  be  the  only  task  data  provided.    This  was  done,  and  56  members  of 
14  SKT  teams  were  given  the  same  set  of  tasks  in  the  form  of  a  deck  of 
125  randomly  ordered  punch  cards,  with  each  card  containing  four  randomly 
generated  ratings  printed  on  the  blank  reverse  side  of  each  card.  To 
avoid  confusion,  the  task  delay  tolerance  factor,  which  used  a  reversed 
scale  relative  to  testing  importance  (l=least  tolerance  for  delay, 
9=most  tolerance  for  delay)  was  reversed  and  called  "requirement  for 
prompt  performance"  to  make  it  directional ly  comparable  to  the  other 
three  factor  scales.    A  blank  card  containing  factor  titles  was  also 
furnished  so  that  the  ratings  could  be  identified  with  the  appropriate 
factor  by  superimposing  the  factor  titles  card  on  the  data  card. 
Figure  2  shows  three  simulated  task  cards  and  a  factor  titles  card. 

Each  subject-matter  specialist  was  asked  to  rankorder  the  cards 
(tasks)  on  testing  importance  using  the  information  provided  on  the 
four  factors.    It  was  up  to  the  subject-matter  specialist  to  visualize 
what  kind  of  task  might  fit  the  data  on  each  card.    The  actual  ranking 
of  the  cards  was  performed  only  after  the  cards  had  twice  been  sorted 
into  five  categories  of  testing  importance  (5  x  5  =  25  categories)  in 
order  to  simplify  the  ranking  process.    A  regression  equation  was 
computed  for  each  subject-matter  specialist  using  the  testing  importance 
rankings  as  the  criterion  variable  against  which  to  regress  the  ratings 
on  the  four  predictor  variables.    Four  cases  from  one  SKT  team  were 
dropped  because  the  members  apparently  did  not  perform  the  rankordering, 
as  evidenced  by  the  extremely  low  correlations  of  all  four  predictor 
variables  with  the  criterion  for  those  cases.    Two  other  cases  were 
dropped  because  of  missing  cards. 

A  hierarchical  clustering  of  the  regression  equations  of  the 
remaining  50  cases  was  performed  to  determine  whether  there  was  more 
than  one  ranking  policy  employed  by  the  subject-matter  specialists. 
Four  distinct  ranking  policies  were  identified, as  shown  in  Table  1. 
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Figure  2.    Factor  Titles  Card  and 
Three  Simulated  Task  Cards 
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NOTE:    Punched  holes  in  card  represent  alphanumeric  characters  used  in 
titles. 


Table  1 

Beta  Weights  and  Values  for  Four  Testing  Importance  Policies 
Identified  by  Hierarchical  Clustering  of  Regression  Equations, 

Using  Simulated  Tasks 


Beta  Weights 


POLICY 

IL 

%  PERF 

LRN  DIFF 

CONSEQ 

DELAY  TOL* 

R2 

A 

15 

.3073 

.1033 

.4495 

.4707 

.5736 

B 

23 

.0318 

.1144 

.8856 

.1177 

.8047 

C 

5 

.0124 

.7965 

.1504 

.0909 

.6634 

D 

3 

.9595 

.0232 

.1072 

.0102 

.9174 

OVERALL 

50 

.1715 

.1650 

.5622 

.1896 

.4153 

*Task  delay  tolerance  scale  was  reversed  and  called  "Requirement  for 
Prompt  Performance." 

Differences  among  policies  are  significant  beyond  .001  level  of 
confidence. 


Policy  A,  which  was  used  by  15  cases,  gave  the  greatest  weight  to 
consequences  of  inadequate  performance  and  task  delay  tolerance,  and 
also  gave  substantial  weight  to  percent  members  performing.    Policy  B, 
which  was  used  by  23  cases,  gave  overwhelming  weight  to  consequences  of 
inadequate  performance.    Policy  C,  which  was  used  by  five  cases,  gave 
overwhelming  weight  to  task  difficulty.    Policy  D,  which  was  used  by 
three  cases,  gave  overwhelming  weight  to  percent  members  performing. 
Four  cases  did  not  fall  into  any  policy  group*    There  did  not  appear  to 
be  any  identifiable  AFSC  pattern  associated  with  the  policy  groupings 
such  that  different  equations  could  be  called  upon  in  comparing  task 
testing  importance  values  for  individual  specialties  or  groups  of 
related  specialties.    Intercorrelations  of  the  predictor  variables  are 
not  reported  here,  for  the  obvious  reason  that  the  random  assignment  of 
factor  ratings  to  the  simulated  tasks  insured  virtually  zero  correlations 
between  these  variables.    While  it  is  true  that  the  data  which  the 
Human  Resources  Laboratory  has  gathered  on  the  four  predictor  variables 
show  substantial  intercorrelation,  the  goal  of  this  policy-capturing 
effort  was  to  obtain  uncontaminated  correlations  of  the  factors  with 
the  criterion  (testing  importance).  If  intercorrclatiori  had  bssn  built 
into  the  assignment  of  ratings  to  the  simulated  tasks,  the  fact  that 
the  four  ratings  on  each  punch  card  were  locked  together  in  the  ranking 
process  would  have  induced  spurious  correlation  of  each  factor  with  the 
criterion. 

Once  uncontaminated  correlations  of  the  predictor  factors  with  the 
criterion  were  obtained,  the  real  intercorrelations  of  the  predictor 
variables,  which  differed  from  one  specialty  to  another,  were  plugged 
into  an  appropriate  regression  model  to  compute  task  testing  importance 
values  using  the  four- factor  composite.    Results  and  comparisons  of 
thin  method  with  the  second  method  will  be  discussed  after  the  second 
method  of  criterion  development  has  been  presented. 

Development  of  Composite  Criterion  Based  on  Factor  Importance  Ratings 

Upon  completion  of  the  rankordering  of  the  simulated  tasks,  each 
subject-matter  specialist  was  asked  to  rate  on  a  nine-point  scale  each 
of  the  four  predictor  variables  on  how  important  he  thought  it  was  in 
determining  testing  importance.    Figure  3  shows  the  scale  used  for 
rating  the  four  predictor  variables  on  testing  importance. 

The  coefficient  of  interrater  agreement  adjusted  for  differences 
in  the  frame  of  reference  for  the  individual  raters  was  computed  as  a 
measure  of  reliability.  The  average  reliability  was  found  to  be  .383 
for  a  single  rater  (Rn)  and  .972  for  the  means  of  the  56  raters  (R|^|^). 
A  second  sample  of  50  raters  was  later  obtained  to  check  the  representa- 
tiveness of  the  56-rater  sample.  No  significant  difference  was  found 
between  the  adjusted  factor  weights  of  the  two  samples  (see  Table  2). 

The  next  step  was  to  use  the  factor  rating  data  to  derive  an 
appropriately  weighted  composite  of  the  four  factors  that  would  serve 


Figure  3.    Scale  for  Rating  Four  Factors 
on  Testing  Importance 

RATING  OF  FACTOR  IMPORTANCE 

Kow  important  do  you  think  each  of  the  following  factors  is  in 
determining  the  testing  importance  of  a  task?   Use  the  9-point  rating 
scale  shown  below. 

The  factor  is; 


1. 

Extreinely  unimportant 

2. 

Very  unimportant 

3. 

Unimportant 

4. 

Slightly  unimportant 

5. 

So-so 

6. 

Slightly  important 

7. 

Important 

8. 

Very  Important 

9. 

Extremely  important 

Factor 

Rating  (1-9) 


1.  %  MEMBERS  PERFORMING   

Z.  LEARNING  DlM^iCULTT    . 

3.  CONSEQUENCES  OF  INAOEQUATE  PERFORMANCE    .  . 

4.  REQUIREMENT  FOR  PROMPTNESS  OF  PERFORMANCE  .  _ 


OEFINITIONS  OF  TASK  RATING  FACTORS 

1.  t  Members  Perfonninq  is  li  measure  of  the  proportion  of  all  airmen 
in  the  appropriate  Air  Force  Specialty  or  shredout  who  perform  the 
task. 

2.  learning  Difficulty  is  a  measure  of  the  need  for  lengthy,  systematic 
training  before  a  new  mender  of  the  appropriate  Air  Force  Specialty 
or  shredout  can  perform  the  task  adequately.    It  may  be  thought  of 
as  the  difficulty  Involved  in  "picking  up"  the  task  on  the  Job  with- 
out any  systematic  training. 

3.  Consequences  of  Inadequate  Performance  is  a  measure  of  the  serious- 
ness  of  the  probable  consequences  of  inadequate  perfonnance  of  the 
task.    It  is  measured  in  terms  of  possible  injury  or  death,  wasted 
supplies,  damaged  equipment,  wasted  man-hours  of  work,  etc. 

4.  Requirement  for  Promptness  of  Perfonnance'  is  a  measure'of  how  much 
delay  can  be  tolerated  between  the  time  an  airman  becomes  aware  the 
task  Is  to  be  performed  and  the  time  he  must  conmence  doing  it. 
Must  he  comnence  innediately,  or  does  he  have  time  to  consult  a 
manual,  seek  guidance,  or  even  be  taught  how  to  do  it? 


as  the  criterion  of  testing  importance.    However,  it  was  first  necessary 
to  standardize  each  of  the  factors  (mean  =  5,  S.D,  =  1)  so  that  all 
factors  would  possess  equal  weight  prior  to  the  application  of  the 
rater-derived  weights.    One  additional  problem  existed  in  regard  to  the 
"percent  members  performing"  factor:    it  was  very  negatively  skeweci  in 
all  samples  and  the  standard  deviation  was  approximately  equal  to  the 
mean.    As  a  result,  task  percentages  below  the  mean  would  tend  to  be 
underweighted  and  percentages  above  the  mean  overweighted,  even  after 
standardization.    Therefore,  it  was  necessary  to  extract  the  logarithm 
of  this  variable  prior  to  standardization  in  order  to  reduce  the 
skewness. 


A  covariance  weighting  technique  was  used  to  adjust  the  factor 
weights  derived  from  the  ratings  of  factor  importance.    This  was  done 
so  as  to  Insure  that  the  factor  weights  would  be  in  accord  with  their 
relative  independence.    The  procedure  used  to  accomplish  the  covariance 
weighting  is  presented  in  Appendix  A. 

Comparison  of  Card-Sorting  and  Factor  Importance  Rating  Approaches 

The  card-sorting  policy-capturing  approach  had  two  distinct 
advantages: 

1.  The  ranking  of  simulated  tasks  on  testing  importance 
involved  the  simultaneous  consideration  of  all  four  predictor  variables, 
rather  than  one  at  a  time. 

2.  Multiple  observations  were  obtainable  on  each  rater. 

On  the  other  hand,  the  card-sorting  policy-capturing  technique 
disclosed  several  weaknesses: 

1.  Many  subject-matter  specialists  foundthe  rankordering 
procedure  overly  complex  and  difficult  to  understand. 

2.  Many  subject-matter  specialists  were  turned  off  by  the 
fact  that  they  were  expected  to  rankorder  sets  of  four  numbers  that 
were  not  associated  with  identifiable  tasks. 

3.  Three  of  the  four  policies  identified  through  the  hier- 
archical grouping  of  individual  rater  policy  equations  gave  76%  to  87% 
of  the  testing  importance  weight  to  a  single  factor.    Only  15  out  of  50 
raters  used  a  multiple  factor  policy.    This  finding  indicated  that  most 
raters  took  the  line  of  least  resistance  and  simply  rankordered  the 

-simulated -tasks  on  a  single  variable  because  that  was  the  easiest  thing 
to  do. 

4.  The  rankordering  of  simulated  task  decks  lacks  credibility. 
The  technique  is  difficult  to  explain  and  is,  therefore,  difficult  to 
justify.    The  technique  also  operated  something  like  a  black  box;  you 
knew  what  went  in  and  what  came  out,  but  were  not  at  all  sure  what 
happened  in  between.    By  way  of  overall  assessment  of  the  policy- 
capturing  technique,  I  would  not  recommend  its  use  with  enlisted  person- 
nel, and,  if  used  at  all,  it  should  be  very  carefully  explained  and 
illustrated  with  numerous  examples.    It  should  also  be  carefully  moni- 
tored during  the  entire  time  the  procedure  is  being  performed. 

The  factor  importance  weighting  technique  had  two  distinct  advantages 

1.    Obtaining  overall  testing  importance  ratings  for  the  four 
predictor  variables  was  a  relatively  quick  and  simple  way  of  obtaining 
factor  weights  from  a  large  number  of  subject-matter  specialists. 
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2.    The  statistical  techniques  used  to  combine  rater  weights 
with  covariance  data  to  predict  a  composite  criterion  were  straight- 
forward and  the  process  was  totally  visible,  which  lent  it  credibility. 

The  principal  weakness  of  the  factor  importance  weighting  technique 
is  that  the  factor  weights  specified  by  the  raters  may  not  have  repre- 
seTited  their  desired  policy  had  the  raters  been  able  to  see  the  resultant 
testing  importance  values.    However,  the  effect  of  using  inadequate 
weights  can  be  corrected  over  time  by  subsequent  subject-matter  special- 
ists who  operationally  use  the  task  testing  importance  values  based  on 
the  inadequate  weights.    Suggested  changes  In  the  ranking  of  specific 
tasks  can  be  translated  into  weighting  revisions  in  the  applicable 
regression  equation. 

Comparisons  of  the  beta  weights  for  the  qard-sorting  group  and  comparable 
scaled-down  mean  rating  weights  for  the  two  factor  importance  weighting 
groups  are  shown  in  Table  2.    Although  the  two  weighting  schemes  yielded 
the  same  rankordering  of  variables  in  terms  of  relative  contribyticn  in 
predicting  the  criterion,  the  two  weighting  systems  would  produce 
significantly  different      values  if  applied  to  the  same  criterion. 
Based  on  the  previously  stated  assessments  of  the  two  methods,  it  would 
appear  that  more  faith  should  be  placed  in  weights  derived  by  the 
factor  importance  weighting  technique,  although  these  weights,  too,  are 
suspect  until  such  time  as  they  have  been  subjected  to  further  validation. 


Table  2 

Weights  Derived  From  Ranking  Technique  for  One  Sample 
and  Rating  Techniques  for  Two  Samples 


Factor  Factor 

Card-Sorting          Importance  Importance 

Sample              Sample  #1  Sample  #2 

Factor                        N=50                    N=56   N=50 


%  MEMBERS  PERFORMING  .1715  .2309  .2684 

LEARNING  DIFFICULTY  .1650  .2250  .2480 

CONSEQ  INADEQ  PERF  .5622  .3444  .3090 

REQ  FOR  PROMPT  PERF*  .1896  .2880  .2629 


*Task  delay  tolerance  scale  was  reversed  and  called  "Requirement  for 
Prompt  Performance." 

Differences  between  the  ranking  weights  and  the  two  sets  of  rater  weights 
are  significant  beyond  .001  level  of  confidence.  Differences  between  the 
two  sets  of  factor  importance  weights  are  not  signficant  (p>.05). 
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10 


Development  of  Task-Based  Computerized  Outline  Formats 


Up  to  this  point,  I  have  discussed  the  selection  of  the  subset  of 
usable  tasks,  the  sorting  of  these  tasks  into  the  three  SKT  testing 
levels,  and  the  development  of  a  composite  task  variable  which  assigns 
a  testing  importance  value  to  each  task  in  the  subset.    It  should  be 
noted  that  the  testing  importance  value  for  the  same  task  will  vary 
from  one  testing  level  to  another  because  of  differences  in  the  percent 
of  members  performing  the  task  at  each  level.    Task  learning  difficulty, 
however,  is  by  definition,  invariant  from  level  to  level. 

Imposing  the  structure  of  a  test  outline  on  a  set  of  tasks  to  be 
presented  in  a  computerized  outline  required  an  important  decision  as 
to  how  the  tasks  should  be  organized  to  form  meaningful  outline  areas. 
Using  the  categories  or  modules  from  the  previous  SKT  outline  seemed  to 
be  the  obvious  solution.    However,  it  soon  became  apparent  that  this 
procedure  had  four  serious  drawbacks: 

1.  The  test  outlines  used  to  develop  previous  SKTs  are 
control led-item  documents.    To  extract  the  outline  areas  from  these 
documents  for  use  in  a  computerized  outline  would  involve  serious 
security  problems  that  would  be  difficult  to  control. 

2.  Outlines  for  many  specialties  contain  extensive  content 
areas  dealing  with  general  principles;  e.g.,  electronic  principles, 
mechanical  principles,  etc.,  as  well  as  the  more  directly  job-related 
categories.    Many  tasks  could  not  be  unambiguously  assigned  to  either 
the  general  principles  area  or  the  job-related  area. 

3.  Previous  test  outlines  may  well  be  out  of  date  and  need 
considerable  revision  before  reuse.    Extensive  outline  revision  would 
negate  one  of  the  primary  purposes  of  developing  a  computerized  outline- 
saving  time. 

4.  Many  test  outlines  are  very  personal  documents  that 
embody  the  peculiar  characteristics  of  the  team  that  produced  it.  As 
such,  it  is  often  unacceptable  to  a  subsequent  SKT  team. 

The  second  document  to  be  considered  was  the  Air  Force  Specialty 
Training  Standard  (STS),  which  lists  in  outline  format  the  various  job 
areas  of  an  Air  Force  specialty  in  which  OJT  is  to  be  conducted.  It 
also  specifies  the  performance  and  knowledge  levels  that  should  be 
attained  in  each  job  area  at  the  apprentice,  journeyman,  and  technician/ 
supervisor  skill  levels.    This  document  is  prepared  at  the  training 
center  responsible  for  the  formal  training  courses  pertaining  to  the 
applicable  specialty  and  achieves  official  Air  Force  status  upon  being 
coordinated  and  approved  at  command  and  Air  Staff  levels.    After  initial 
publication,  the  STS  continues  to  be  updated  as  needed.    The  official 
status  of  the  STS,  the  manner  in  which  it  is  prepared  and  approved,  and 
its  currency,  seemed  to  make  it  an  ide»^l  document  from  which  to  obtain 
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a  framework  for  organizing  tasks  into  meaningful  test  outline  modules. 
As  good  fortune  would  have  it,  modular  organization  of  tasks  had  also 
become  a  necessary  extension  of  training  priorities  research  being 
conducted  at  the  Air  Force  Human  Resources  Laboratory.    As  a  result, 
modular  capability  was  added  to  the  Comprehensive  Occupational  Data 
Analysis  Programs  (CODAP)  about  six  months  after  its  need  became 
critical  to  the  development  of  the  SKT  computerized  outline.  However, 
in  order  to  group  tasks  into  STS  modules,  tasks  first  had  to  be  matched 
with  STS  work  areas.    STS  coding  of  tasks  by  STS  paragraph  numbers  was 
to  be  performed  by  SKT  teams  at  the  same  time  as  they  coded  tasks  for 
"usability."    Directions  given  to  subject-matter  specialists  for  STS 
coding  of  tasks  are  shown  in  Figure  4.    Sample  pages  from  a  job  inven- 
tory booklet  illustrating  STS  coding,  usability  coding,  and  write-in 
tasks  are  shown  in  Appendix  B.    Just  as  task  usability  coding  was 
accomplished  by  team  consensus,  so  also  for  STS  coding.    However,  more 
than  one  STS  area  was  allowed  to  be  assigned  to  a  task  if  the  subject- 
matter  specialists  felt  that  this  was  necessary.    Two  job  inventory 
booklets  were  coded--one  booklet  was  retained  for  use  in  SKT  outline 
development  and  the  other  was  made  available  to  job  inventory  developers 
and  job  analysts  at  the  Occupational  Measurement  Center.    The  developers 
and  analysts  used  the  STS  and  task  usability  codings,  as  well  as  suggested 
task  revisions  and  additions  provided  by  the  SKT  team,  to  assist  in 
developing  and  updating  job  inventories,  or  as  an  aid  in  organizing, 
evaluating,  and  analyzing  job  survey  data.    The  job  inventory  booklet 
retained  for  SKT  outline  development  was  forwarded  for  keypunching  of 
the  STS  and  task  usability  codes.    STS  work  area  titles  were  keypunched 
at  the  same  time. 


Figure  4.    Coding  Instructions  for 
Recording  STS  Areas 


£02l!i5  INSTRUCTIONS 
1.    RtCORDlNG  STS  AREAS 

A.  For  each  task.  Indicate  the  appropriate  STS  area,  subarea,  and 
sub-subarea  In  the  "Check**  column;  e.g.,  3c(2). 

B.  If  there  are  no  subareas,  record  only  the  major  area;  e.g..  14. 

C.  If  more  than  one  STS  area,  subarea,  or  sub-subarea. applies,  record 
all  of  them;  e.g.,  3c(2),  7c(1,  2,  3,  4). 

0.     At  any  level  of  the  STS  which  does  not  apply,  record  a  dash;  e.g., 
—  (no  STS  area  applies),  3—  (no  subarea  or  sub-subarea  applies), 
3c-  (no  sub-subarea  applies). 


The  Development  of  Weights  for  Test  Outline  Areas 

The  subset  of  tasks  selected  for  use  on  the  SKTs  was  carefully 
screened  for  usability;  i.e.,  only  tasks  with  a  usability  code  of  '*8" 
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or  "9"  were  retained.    Assuming  that  appropriate  testing  importance 
weights  had  also  been  computed  for  each  task,  it  seemed  logical  that 
the  procedure  for  weighting  an  outline  area  was  to  sum  the  task  testing 
importance  weights  for  that  area  and  divide  this  sum  by  the  sum  of 
testing  Importance  weights  for  aV[  tasks  in  the  SKT  subset.  This 
(zdmputatien  Wd§  earrled  out  in  each  outline  area  to  transform  the  sums 
into  proportionate  weights,  and  the  proportionate  weights  were  easily 
convertible  into  quotas  for  the  number  of  items  to  be  written  per 
outline  area  (total  of  135  per  SKT,  80  per  Apprentice  Knowledge  Test) 
and  the  number  of  items  that  would  actually  be  selected  for  use  in  the 
test  per  outline  area  (total  of  100  per  SKT,  65  per  AKT).    For  tasks 
that  had  been  coded  into  more  than  one  STS  area,  the  procedure  was  to 
divide  the  testing  importance  weight  for  each  task  by  the  number  of 
assigned  STS  areas  and  use  this  partial  weight  in  the  summing  operation. 
As  of  now,  only  the  summing  of  the  testing  importance  weights  by  outline 
area  can  be  accomplished  by  computer. 

The  computation  of  the  proportionate  weights  and  the  conversion  of 
proportionate  weights  to  numbers  of  items  to  write  and  to  select  must 
be  done  by  hand.    Hopefully,  the  necessary  computer  progranmiing  to 
replace  the  manual  operation  will  be  accomplished  within  the  next 
several  months. 

Actualization  of  the  Computer-Derived  Outline 

All  the  components  required  for  the  computer-derived  outline  were 
now  ready  for  assembly  into  a  printed  output.    Two  types  of  format  were 
decided  upon:    one  for  use  with  a  vertical  method  of  test  construction, 
and  another  for  use  with  a  horizontal  method.    In  the  vertical  method, 
SKTs  are  constructed  for  one  testing  level  at  a  time.    A  print  format 
was  designed  for  this  method  that  consisted  of  three  separate  outlines, 
each  of  which  listed  only  the  tasks  to  be  tested  at  that  level  in  order 
of  testing  importance.    Appendix  C  shows  a  portion  of  a  5-level^(E-5) 
outline  for  use  with  the  vertical  method  of  test  construction.    In  the 
horizontal  method,  all  levels  of  SKTs  are  constructed  simultaneously, 
with  test  items  being  assigned  to  the  appropriate  level  as  they  are 
being  written.    For  this  method,  the  most  appropriate  outline  was  one 
which  presented  the  outlines  for  all  levels  side  by  side  in  a  single 
document.   The  entire  set  of  tasks  to  cover  all  levels  was  listed,  with 
zero  testing  importance  values  showing  for  a  task  at  the  level (s)  for 
which ;.the  task  was  not  appropriate.    Tasks  were  listed  within  each  STS 
module Tn  job  inventory  sequence,  rather  than  ordered  on  testing  impor- 
tance»    Large  open  spaces  were  provided  to  the  right  of  each  column  of 
testing  importance  values  to  allow  rdom  for  test  item  numbers  or  other 
information  to  be  recorded.    Appendix  D  shows  a  portion  of  a  combined 
5-level  (E-5)  and  7-level  (E-6/E-7)  outline  designed  for  use  with  the 
horizontal  method  of  test  construction. 

The  printed  output  for  either  format  is  the  product  of  the  CODAP 
MODCHK  and  FACPRT  programs.    An  ''executive  summary"  option  has  recently 


been  added  to  FACPRT,  which  performs  the  summation  of  the  testing 
importance  values  by  STS  module.    In  the  near  future,  the  executive 
summary  option  will  also  be  able  to  compute  and  display  quotas  for  the 
number  of  test  items  to  be  written  and  the  number  of  test  items  to  be 
selected  for  each  STS  module.    Appendix  E  shows  an  example  of  an  execu-  • 
tive  summary  as  it  is  expected  to  appear. 

In  a  paper  which  immediately  follows  this  one,  Capt  Conrad  Bills 
will  assess  the  usability  of  the  computer-derived  test  outlines,  based 
on  his  experiences  in  using  them  in  several  SKT  test  construction 
projects. 

Concluding  Discussion 

Perhaps  the  weakest  link  in  the  development  of  a  test  outline  from 
occupational  survey  data  is  that  the  task  data  do  not  specify  the  kind 
or  degree  of  knowledge  required  to  successfully  perform  each  task. 
Nevertheless,  the  computerized  outlining  procedure  presented  in  this 
paper  can  be  justified  in  several  ways.    First  of  all,  the  use  of  task- 
based  data  can  be  justified  on  the  grounds  that  the  Equal  Employment 
Opportunity  Coordinating  Council  (EEOCC)  guidelines  are  better  served 
by  a  test  based  directly  on  task  data.    Such  a  test  is  ostensibly  more 
job-related  than  a  test  based  on  knowledge  requirements,  because  knowl- 
edge requirements  are  at  least  one  step  removed  from  the  task  level  and 
are  more  subjective.    Secondly,  the  task  statements  in  an  outline  can 
be  viewed  as  stimuli  and  the  task  data  as  guidelines  in  directing 
subject-matter  specialists  toward  selecting  and  emphasizing  in  their 
test  item  writing  those  knowledges  that  are  most  pertinent  to  the  job. 
In  this  model,  subject-matter  specialists  are  viewed  as  the  link  between 
task  specifications,  as  laid  out  in  the  task-based  outline,  and  knowledge 
specifications,  as  determined  by  work  experience  and  reference  materials. 
Thirdly,  the  computerized  outlining  procedure  can  be  defended  as  a 
generalized  procedure  that  is  able  to  incorporate  job  knowledge  require- 
ments in  the  outlining  process  with  little  difficulty  when  such  informa- 
tion is  available.    In  the  electronics  career  fields,  for  example,  the 
electronics  principles  inventory  developed  by  O'Connor,  Ruck,  and 
Driskill  (1975)  could  be  interfaced  with  specially  screened  task  lists 
in  such  a  way  as  to  attach  an  electronics  theory  section  onto  task- 
based  outlines  for  critical  tasks.    The  resultant  outline  would,  in 
fact,  more  closely  resemble  the  conventional  outlines  developed  by 
subject-matter  specialists,  who  typically  include  a  theory  section. 
The  Plan  of  Instruction  (POI)  generated  at  the  tn*ning  center  for  each 
formal  course  could  also  be  coded  to  the  task  Ifist  to  provide  a  detailed 
Interface  between  tasks  and  knowledge  requirements. 

Another  problem  relating  to  knowledge  requiirements  is  that  of 
tasks  which  have  overlapping  knowledge  requiremaits.    Why,  for  example, 
should  two  tasks  with  heavily  overlapping  knowledge  requirements  be 
allowed  to  have  separate  testing  importance  weights  and  thereby  make 
independent  contributions  to  the  computation  of  testing  importance 


weights  for  STS  modules?    It  could  just  as  easily  be  asked,  "Why  not?" 
If  an  item  of  knowledge  is  applicable  to  two  important  tasks,  its 
importance  is  better  reflected  by  allowing  it  the  summed  weight  of  both 
tasks  than. by  limiting  its  weight  to  that  of  one  task.    One  exception 
to  this  rule  would  be  redundant  tasks.    The  number  of  redundant  tasks, 
however,  will  be  few  in  a  well -constructed  job  inventory.    What  redun- 
dancy exists  should  be  virtually  eliminated  by  the  task  usability 
coding  process.    Code  "4"  is  intended  to  filter  out  redundant  tasks. 
Even  if  knowledge  requirements  were  available  for  tasks,  it  would  be 
viftuaMy  impossible  to  determine  the  degree  of  knowledge  overlap 
between  two  tasks  with  similar  knowledge  requirements. 

Another  major  area  of  concern  has  been  the  job  survey  data  itself. 
Subject-matter  specialists  have  frequently  complained  that  the  inventory 
data  are  incomplete,  outdated,  or  inaccurate.    In  the  case  of  surveys 
more  than  two  years'  old,  outdatedness  can  be  a  serious  problem. 
However,  unless  there  are  evidences  of  extensive  career  field  changes, 
it  is  more  likely  than  not  that  survey  data  based  on  the  responses  of 
hundreds,  and  perhaps  thousands  of  job  incumbents,  is  still  more  accurate 
overall  than  the  limited  experience  of  several  subject-matter  specialists. 
Incompleteness  of  the  inventory  task  list  is  an  area  that  can  best  be 
handled  by  adding  and  weighting  the  missing  tasks,  which  are  requested 
as  part  of  the  task  usability  coding  process  (see  sections  B  and  C  of 
Figure  1).    As  stated  previously,  the  write-in  tasks  are  forwarded  to 
the  job  inventory  developers  to  be  considered  for  inclusion  in  the  next 
task  inventory.    This  interplay  between  the  testing  process  and  the 
inventory  development  process  should,  in  time,  accrue  to  the  benefit  of 
both. 

The  validity  of  the  weighted  composite  as  a  measure  of  testing 
importance  remains  an  area  of  continuous  evaluation.    While  the  weights 
derived  from  the  original  sample  of  56  raters  had  high  interrater 
agreement  (.972),  there  is  no  guarantee  that  the  sample  was  representa- 
tive.   A  subsequent  sample  of  50  raters  produced  weights  that  were  not 
significantly  different  from  the  weights  derived  from  the  56-rater 
sample  (see  Table  2).    One  possible  solution  to  the  weighting  problem 
would  be  to  gather  factor  ratings  from  one  complete  year  of  SKT  test 
development  teams  (four  personnel  from  each  of  approximately  250  special- 
ties and  shredouts).    Not  only  would  the  sample  be  large  and  representa- 
tive, but  another  look  could  be  taken  at  the  possibility  of  finding 
differential  rating  policies  attributable  to  specialty,  career  field, 
conmand,  grade,  or  other  variables.    Differential  rating  policies  that 
can  be  tied  to  specific  variablei>  can  be  translated  into  weighted 
testing  importance  composites  tailored  to  the  specific  outline  require- 
ments of  each  SKT.    Such  a  project  would  not  be  too  costly  in  time  or 
manpower.    Obtaining  the  factor  ratings  would  take  about  ten  minutes 
per  SKT  team,  including  instructions,  and  would  ideally  take  place  at 
the  conclusion  of  a  test  development  project.    During  the  year  of  data 
gathering,  the  weighted  equation  currently  in  use  could  continue  to  be 
updated  as  additional  ratings  are  obtained. 
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The  current  testing  importance  equation  is  confined  to  "percent  of 
members  performing"  and  "task  learning  difficulty"  for  most  specialties 
because  of  the  frequent  unavailability  of  data  on  "probable  consequences 
of  inadequate  performance"  and  "task  delay  tolerance."   On  the  other 
hand,  there  is  a  continual  buildup  of  data  on  the  "recommended  training 
emphasis"  variable.   While  it  does  not  appear  that  training  emphasis 
can  be  used  as  a  substitute  for  testing  importance,  as  discussed  earlier 
in  this  paper,  studies  should  be  made  to  compare  training  emphasis  with 
testing  importarxe  to  determine  what  generalizations  can  be  made  concern- 
ing similarities  and  differences.    It  may  well  be  that  training  emphasis 
could  play  an  important  role  in  improving  estimates  of  testing  importance 
of  tasks. 

Another  important  area  in  which  potential  problems  exist  is  that 
of  task  selection  criteria.    The  need  to  use  the  major  command  variable 
in  selecting  tasks  came  to  light  only  after  it  was  discovered  that  a 
whole  block  of  command-specific  tasks  had  been  added  to  a  computerized 
outline  because  no  selection  criterion  had  been  applied  which  required 
that  incumbents  performing  a  task  be  representative  of  all  the  major 
using  commands.    As  experience  with  the  computerized  outline  grows, 
other  necessary  selection  criteria  will  undoubtedly  come  to  light,  and 
current  selection  criteria  will  have  to  be  augmented.    Some  of  the  new 
criteria  may  be  general,  others  may  be  specific  to  a  specialty  or 
career  field. 

Although  the  computerized  test  development  outline  is  intended  to 
be  a  stand-alone  product,  it  is  not  intended  to  be  the  only  computer 
product  used.    It  would  be  foolish  for  an  SKT  team  not  to  use  other 
available  occupational  survey  data,  such  as  the  variable  summary  (VARSUM) 
which  contains  information  on  tools,  equipment,  manuals,  and  procedures 
used  by  job  incumbents,  as  well  as  other  information  pertinent  to  the 
test  construction  process. 

Various  time-saving  devices  are  under  consideration  to  make  the 
computerized  outlining  technique  more  cost  effective.    One  such  device 
is  to  provide  the  subject-matter  specialists  who  perform  the  STS  and 
task  usability  coding  with  a  task  list  from  which  low  performance, 
supervisory,  and  command-specific  tasks  have  been  eliminated,  in  lieu 
of  the  current  requirement  that  all  tasks  in  the  job  inventory  booklet 
be  coded.    This  would  probably  cut  the  normal  four-hour  coding  time  to 
no  more  than  an  hour.    Another  time-saving  device  would  be  to  simplify 
and  automate  as  much  of  the  computer  runstream  as  possible.  Currently, 
as  many  as  ten  separate  computer  runs  are  being  made  to  produce  the 
final  outline  product.    A  third  time  and  cost  saver  would  be  to  develop 
standardized,  self-explanatory  forms  for  requesting  a  computerized 
outline.    This  would  reduce  request  preparation  time  and  would  permit 
the  use  of  low-pay  clerical  personnel  to  prepare  the  requests. 

The  ultimate  computerized  outline  document  is  at  least  several 
months  away.    This  document  will  not  only  provide  the  listing  of  tasks 
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within  STS  modules  and  the  testing  importance  values  for  tasks,  but 
also  the  nurrter  of  test  items  to  write  and  select  on  each  task.  In 
addition,  a  summary  report  will  be  provided  that  will  list  the  STS 
modules  in  outline  format  and  the  percentage  weight  computed  for  each 
module,  along  with  quotas  for  the  number  of  test  items  to  be  written 
and  the  number  of  test  Items  to  be  selected  for  each  module. 

Since  the  development  and  implementation  of  the  new  test  outline 
technique  described  in  this  paper  is  an  ongoing,  incremental,  and 
Interactive  process;  occasional  modifications  will  be  required,  and  a 
few  specialties  may  present  insurmountable  difficulties.  Nevertheless, 
the  procedure  is  viable,  and  the  alternative— failure  to  make  adequate 
use  of  occupational  data  in  test  development— is  no  longer  acceptable. 


if)'.- 
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Appendix  A 
Factor  Covariance  Weighting  Technique 


With  factor  importance  weights  computed  and  factor  data  standard- 
ized, the  next  step  in  the  factor  covariance  weighting  technique  was  to 
determine  the  correlation  of  each  weighted  factor  with  the  four- variable 
composite  criterion,  including  in  the  computation  the  known  covariances 
of  the  component  variables.    This  was  accomplished  by  applying  the 
following  equation  to  each  component  variable  of  the  compositSj  using 
as  input  the  four-variable  variance-covariance  matrix  for  a  specific 
Air  Force  specialty: 


2  ""^ 


\ 


n  n-1  n 

z   w/  +  2   z  z 
1=1    ^         i=l  j=i+l 


w*w -  r-  • 

1  J  ij 


where  r-j^  ~  correlation  of  variable  1  with  composite 

wi  =  rater-derived  weight  for  variable  1 

n^l  =  the  sum  of  the  cross-products  of  rater- 

i  Wiw^ri^-       derived  weights  and  covariances  which 

i=l  ^       involve  variable  1 

n  =  the  sum  of  the  squared  weights  for  the 

Z    w-^  "n"  variables 
i=l 


n-1     n  =  the  r^um    "  the  cross-products  of  weights 

2    z      z     w^-Wj-r^-j      and  covariances  for  the  entire  variance- 
i=l  j=i+l  covariance  matrix 


NOTE:    Since  the  variance  elements  in  the  variance-covariance  matrix  = 
1.00,  they  have  been  dropped  from  all  cross-products  in  the 
equation. 
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With  the  correlations  between  each  variable  and  the  composite 
criterion  computed,  it  was  then  possible  to  regress  the  four  component 
variables  on  the  composite  to  arrive  at  their  appropriate  standard  score 
weights.   These  weights  would,  of  course,  vary  from  specialty  to  specialty. 
When  actually  computing  testing  importance  values  for  tasks,  the  criterion 
parameters  were  standardized  to  mean  =  5  and  S.D.  =  2.3   These  parameters 
were  chosen  to  satisfy  several  requirements: 

1.  To  standardize  and  simplify  the  interpretation  of  task  testing 
importance  values  across  all  specialties. 

2.  To  identify  tasks  of  very  low  testing  importance  which  would 
later  be  eliminated  from  inclusion  in  the  computerized  test  outline. 
This  was  accomplished  by  setting  the  testing  importance  of  a  task  equal 
to  zero  if  the  computed  testing  importance  was  less  than  zero  (more  than 
-2.50  S.D.  below  the  mean). 

3.  To  maximize  the  variance  of  the  criterion  composite  without 
deviating  from  requirements  1  and  2.    Maximizing  the  variance  not  only 
added  visual  emphasis  to  differences  in  testing  importance  between  tasks, 
but  also  reduced  the  mean  (relative  to  the  variance)  in  the  calculation 
of  outline  area  weights.    Weighting  of  the  test  outline  will  be  discussed 
later. 

In  actual  practice,  "probable  consequences  of  inadequate  performance" 
and  "task  delay  tolerance"  data  were  not  available  for  the  specialties  in 
which  there  was  an  opportunity  to  experiment  with  this  second  technique. 
As  a  result,  the  computation  of  task  testing  importance  included  only  the 
"percent  members  performing"  and  "task  difficulty"  variables.    Even  here, 
the  factor  covariance  weighting  technique  was  applicable  and  differences 
in  the  single  covariance  value  produced  differences  in  the  standard 
score  weights  for  the  two-variable  composite. 


3Since  each  of  the  four  factors  in  the  composite  were  standardized  to 
mean  =  5  and  S.D.  =  1 ,  the  mean  of  the  composite  would  also  equal  5. 
To  set  the  standard  deviation  of  the  composite  equal  to  2,  all  that  was 
necessary  was  to  multiply  each  of  the  four  beta  weights  by  2  before 
computing  the  composite. 
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Appendix  B 

Sample  Pages  from  Job  Inventory  Booklet  Illustrating 
STS  Coding,  Usability  Coding,  and  Write-in  Tasks 


iOt  lNVINTOty                            1""  421X5           —    »6  37 

1     fe  lk«  *'Tla*  Im****  ••1mm.  ffl*  ctivckM  1                 M  tim»  mpmmt  U  r^w  ptu^nrnt 

TlMl  IMMT 

ir 

»•«  ••IMII. 

J.    MAINTAIHirjC  AEftOSPACt  GWUNO  EQUIPMENT  (AGE)  ELECTRICAL 
SYSTEMS 

OONC 

■•■•1  ■*■■•«■ 

1.    Callbra't^  find  align  prtnled  circuit  board  cIrcuiTs 

He 

<1  62 

2.    Claan  and  adjust  contractor  points 

1  " 

3.    Ctaan  and  adjust  alactrical  tharmostats 

H- 

1  64 

4.    Claan  and  adjust  magneto  or  distributor  points 

r  « 

5.    Ctaan  or  ragap  spark  plugs  rr  ijnitnr  p  '*  a*   — 

— 

5^    Intarprat  and  usa  wiring  diagrams  In  tracing  aiactrlcal 
systams 

Mi 

1,    Isolata  dafactlva  aqulpr<int  componants  or  wiring 

<l  68 

8.    Maasura  tha  valuas  of  alactrical  systems  using  tast 
aqui  pmant 

4.    Perform  tachnical  ordar  rodlf Icatlons  on  Aerospace 
Ground  Equipment  (AGE)  electrical  systems 

— 

70 

iO.    Prepara  AGE  electrical  systems  for  storage 

13- 

1  -7! 

II.    Rebuild  distributors  or  magnetos 

I  7? 

12.    Rebuild  load  contactors 

Z07:73 

13.    Ratiui  Id  relay  panels 

14.    Rebuild  voltage  requlators 

15.    RbiTCve  AGE  electrical  syste™*  '''Ort  sioraga 

7 

•2.    7  ' 

16     RaiTiova,  Inspect,  cluan,  or  Install  aiactrlcal  V"?^'^- 
componants  rt^uUT^r 

7 

17.    Remove  or  Install  canon  plugs 

02::- 
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I3S    681  PREPARE  ADO  RESTORE  RELIGIOUS  FACILITIES,  EQUIPMENT,  AND    .    ^ 

APPOINTMENTS  FOR  RELIGIOUS  SERVICES  O  O 

  .  O 

M9  PREPARE  CHAPEL  FACILITIES  TO  SUPPORT  CATHOLIC  WORSHIP        .    5.81    .   4.23  54.9     

SERVICES  Q  rf 

II  17  PREPARE  CHAPEL  FACILITIES  TO  St?PORT  CATHOLIC  SACRAMENTAL       .  5.69  .   ..4.28  50.0  ....     ^  fl) 

RITES  5  2 

lUS.PREPftRLCKAPEL  fACILITIES^CI.EflCUflCfilCAL. SERVICES  L1LJB.«   J 

II  25  PREPARE  CHAPEL  FACILITIES  TO  SUPPORT  GENERAL  PROTESTANT  5.43  3.95  53.5  y  q 

WCRKSHIP  SERVICES    _    .....  a 


036    6B2  -.-RELIGIOUS  EOUCAIJON^ 


H  23  PREPARE  CHAPEL  FACILITIES  TO  SUPPORT  GENERAL  PROTESTANT  5.37  4.06  48.6 

SACRAMENTAL  RITES                                                    .                 .  .     g^g 

I  15  RESTORE  CHAPEL  FACILITIES  AFTER  USE                               4.85               3.43  54.2  ^ 

l^REPARE-CHAPEL  FACILITIES .ULSUPEDRLlMfiEAIIh  U2  Ul-IlJ   < 

SERVICES/ACTIVITIES  2 

»  20  PREPARE  CHAPEL  FACILITIES  TO  SUPPORT  DENOMINATIONAL     .     .    4.62    .  .       4.27  29.2           .   _     2- ro 

SACRAMENTAL  RITES  U. 

H  21  PREPASC  CHAPEL  FACil.  HIES  TO  SUPPORT  DENOMINATIONAL              4.5?               4.32  27.1          ..      0  0> 

SERVICES  ^  -♦iS 

H  34  PROVIDE  IITERAIURE  FOR  CHAPEL.ORlEffTEO  PaOGRAHS  4.51  4J8_29.2  

r  fORGAKlZE  LAY  PERSONNEL  TO  SUPPORT  SACRAMENTAL  RITES               4.5!               4.80  20.1  2  13 

H    8  NEUTRALIZE  CHAPEL  ALTAR  AFTER  SERVICES                           4.38               3.19  49.3                                       .  fD  Ch  Q. 

I    4  CLEAN  ECCLESIASTICAL  EQUIPMENT                                  3.76              3.12  37.5  rf^^-". 

zr  m  X 

—                         •  -  0  1 

  asjo 


  o  H 

  _    -h  (t 

H    2  COORDINATE  WITH  LAY  PERSONNEL  IN  SUPPOR'T  OF  RELIGIOUS  6.38  4.56  59.7  ^ 

EDUCATION  ACTIVITIES  -      -    0  !I 

I  15  RESTORE  CHAPEL  FACILITIES  AFTER  USE  <.85  3.43  54.2  W  3 

H  J.4  PROVIDE  LITERAIURE.FOR  CHAPEL  ORIENTED  PROGRAMS  4J1  ULJ^l   rKQ 

h'  32  PREPARE  F4CILITIE5  FOR  RELIGIOUS  EDUCATION  ACTIVITIES  SUCH        4.21  3.85  30.6 

AS  PRE-HARRIAGE  OR  PARENT  EFFECTIVENESS  TRAINING  (P.E.T.).  ..  .    0  3 

H    4  MAINTAIN  RELIGIOUS  EDUCATION  CURRICULUM  CATALOGS  3.60  3.62  25.7  3  -g 

H    3  ISSUE  RE'.IGIOUS  EDUCATION  MATERIALS  OR  SUPPLIES  3.59  3.79  22.9  W  0 

 ^   -5  ft 

03y'"6B3~  -RELA'fED  CHAPEL  ACTIVITIES 

  rf  n 

-J.  (D 

H  16  PREPARE  CHAPEL  FACILITIES  FOR  MEMORIAL/FUNERAL  SERVICES  5.77  4.32  50.7  § 
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EVALUATION  OF  COMPUTER-DERIVED  TEST  OUTLINES 
USING  CONVENTIONAL  TEST  OUTLINES  AS  A 
CRITERION  REFERENCE  DURING  TEST 
DEVELOPMENT  PROJECTS^ 

Conrad  G.  Bills,  Capt,  USAF 
USAF  Occupational  Measurement  Center 

At  the  1976  and  the  1977  Military  Testing  Association  conferences,  Vaughan 
(1976,  1977)  described  the  interrelationship  between  test  construction  and  occu- 
pational surveying  activities  at  the  USAF  Occupational  Measurement  Center.  As 
part  of  the  cross-feed  between  these  two  activities,  Vaughan  (1977)  described  a 
procedure  for  the  automated  conversion  of  occupational  survey  data  into  a  test 
outline.    This  computer-derived  outline  would  indicate  the  number  of  test  items 
to  be  written  on  each  topic.    There  were  two  procedures  that  had  been  attempted 
and  he  mentioned  that  a  synthesis  of  these  procedures  was  being  tested.  William 
J.  Phalen  (1978)  has  described  the  development  of  this  synthesized  technique  for 
using  occupational  survey  data  to  construct  and  weight  computer-derived  test  out- 
lines.   This  technique  is  designed  to  increase  the  relative  ease  with  which  oc- 
cupational survey  data  can  be  incorporated  into  the  test  construction  process. 
The  incorporation  of  survey  data  will  in  turn  strengthen  the  content  validity 
position  (Vaughan,  1977)  of  these  tests  which  are  under  the  Weighted  Airman  Pro- 
motion System  0/APS).    Under  the  proposed  EEOC  guidelines  0977),  the  need  for 
a  strong  validity  position  is  paramount.    The  purpose  of  this  study  was  to  eval- 
uate the  experimental  application  of  the  computer-derived  test  outline  procedure 
u.^ing  the  conventional  outline  as  a  criterion  reference. 

Ct iventional  Outline 

The  conventional  outline  development  procedure  described  by  Vaughan  (1976) 
has  been  used  consistently  over  two  decades  (USAF  Occupational  Measurement 
Center,  1977).    An  average  test  development  team  consists  of  four  subject-matter 
specialists  (SMSs).    SMSs  are  first  asked  to  divide  their  job  specialty  into 
major  divisions.    These  divisions  constitute  the  major  outline  areas.    The  major 
outline  areas  are  then  subdivided  as  appropriate.    Once  the  team  members  have 
reached  agreement,  they  are  asked  to  assign  percentage  weights  to  each  outline 
area.    The  resultant  percentage  weights  determine  the  number  of  test  questions 
to  be  written  for  each  division  of  the  job  specialty.    A  sample  of  the  outline 
format  is  shown  in  Figure  1.    Percentage  weights  are  determined  by  SMSs,  based 

Insert  Figure  1  about  here 

/  .  .  

on  -their  knowledge  and  experience.    Their  judgment  is  supplemented  by  the  occu- 
pational survey  data  provided  to  them.    Since  test  construction  teams  have  found 
survey  data  difficult  to  use,  the  contribution  of  survey  data  to  the  outline  de- 
velopment process  has  been  minimal  (Vaughan,  1976). 


The  views  expressed  in  this  p&per  represent  those  of  the  authors  and 
do  not  necessarily  reflect  the  views  of  the  United  States  Air  Force 
or  the  Department  of  Defense. 
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Computer-Derwed  Outline 


The  procedure  for  developing  a  computer- derived  test  outline  differs  from 
the  procedure  for  developing  the  conventional  outline.    The  initial  computer 
printout  displays  selected  occupational  survey  tasks  with  testing  importance 
values     A  sample  page  from  a  computer  printout  is  shown  in  Figure  2.  These 
tasks  are  presorted  by  Specialty  Training  Standard  (STS)  paragraph.  Therefore, 

Insert  Figure  2  about  here 


the  test  construction  team  only  evaluates  the  printout  and  finalizes  the  major 
outline  areas.    The  SMSs  can  adjust  the  task  sorting  for  major  outline  areas. 
They  can  also  adjust  the  percentage  weights  that  have  been  determined  by  the 
testing  importance  values.    A  conversion  table  is  given  to  the  team  for  deter- 
mining the  equivalent  percentage  weights  from  the  testing  importance  values 
that  are  on  an  initial  computer  product  (Table  1).    The  team  must  justify  the 

Insert  Table  1  about  here 


changes  they  make  to  the  computer  product.    Like  the  conventional  outline, 
the  resultant  percentage  weights  determine' the  number  of  test  questions  to 
be  written  for  each  division  of  the  job-specialty.    Unlike  the  convention- 
al outline,  occupational  survey  data  is  the  basis  for  computer-derived  outline 
development. 

Method 

Four  test  construction  projects  were  selected  for  the  evaluation  of  the 
computer-derived  outline.    Each  team  consisted  of  four  subject-matter  special- 
ists (SMSs)  from  their  respective  career  fields.    The  SMSs  were  either  selectees 
or  held  the  Air  Force  grade  of  E-7,  Master  Sergeant,  or  higher.    These  SMSs  and 
the  test  psychologists  who  conducted  each  project  voluntarily  agreed  to  use  the 
computer-derived  outline  procedure.    The  four  projects  were  as  follows:  631X0, 
Fuel  Specialist  and  Fuel  Supervisor;  316X0F,  Missile  Systems  Analyst;  328X3, 
Electronic  Warfare;  and  701X0,  Chapel  Management.    For  each  project  a  recently 
completed  occupational  survey  was  available.    For  the  328X3  and  the  701X0,  cur- 
rent computer  programming  also  allowed  for  the  presorting  of  the  tasks  by  Spe- 
cialty Training  Standard  (STS)  and  the  prerating  of  the  tasks  according  to  the 
usability  importance  of  the  tasks  for  testing.    An  occupational  survey  task  was 
selected  for  the  computer-derived  outline  if  twenty  percent  or  more  of  the  mem- 
bers performed  the  task.    Supervisory  tasks  were  not  selected,  nor  were  tasks 
selected  with  resulting  testing  importance  values  of  zero  (Phalen,  1978). 

The  first  three  test  construction  teams,  631X0,  316X0F,  and  328X3  began 
with  the  conventional  outline  development  procedure  and  then  they  evaluated  and 
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m   finalized  the  computer-derived  outline.    The  fourth  team,  701X0,  developed  only 
^  the  computer-derived  outline  independently  of  the  conventional  outline  procedure. 
Because  of  the  relative  consistency  of  conventional  outlines  from  one  test  re- 
vision to  the  next,  the  previous  team's  conventional  outline  was  used  as  the 
criterion  for  the  701X0  project.    Verbal  feedback  was  elicited  from  all  of  the 
SMSs.    The  328X3  and  701X0  teams  also  completed  the  Outline  Questionnaire  (Fig  3). 

Insert  Figure  3  about  here 


Using  the  conventional  outline  as  a  criterion,  the  resultant  percentage 
weights  were  compared  for  each  major  outline  area.    Percentage  weight  differences 
were  computed  between  the  conventional  outline  major  areas  and  the  major  areas 
of  the  cOTiputer-derived  outline.    These  differences  were  compared  with  the  total 
number  of  tasks  printed  on  the  computer  product.    Homogeneity  of  tasks.  I.e., 
the  commonality  of  tasks  across  the  career  field,  was  considered. 

Responses  to  the  Outline  Questionnaire  were  separated  Into  positive,  negative, 
or  indifferent  response  to  assess  attitude  toward  the  computer-derived  outline. 
The  last  questionnaire  item  was  used  to  assess  SMS  position  as  to  which  outline 
development  procedure  thev  preferred.    Attitude  toward  the  occupational  survey 
(third  questionnaire  item)  was  compared  with  the  preferred  outline  development 
procedure. 

Results 

For  all  four  test  construction  projects  the  percentage  weight  differences 
for  each  major  outline  area  between  the  final  computer-derived  outline  and  the 
conventional  outline  were  not  significantly  different.    The  compar1s:on  for  the 
316X0F  project  is  shown  in  Table  2.    The  resultant  percentage  weights  for  the 

Insert  Table  2  about  here 


computer-derived  outline  are  presented  for  the  initial  computer  product  and  al- 
so for  the  final  outline.  The  number  of  tasks  selected  for  each  skill  level  is 
indicated  in  the  footnote. 

The  comparison  for  the  701X0  project  is  presented  in  Table  3.    The  large 


Insert  Table  3  about  here 


difference  in  major  outline  area  III  resulted  from  a  low  number  of  selected 
tasks  that  could  not  be  referenced  to  the  career  development  course  (CDC). 

Table  4  is  the  631X0  project  comparison.    The  zero  weight  for  the  conven- 

Insert  Table  4  about  here 


\  tional  and  final  computer-derived  outline  area  V  was  a  judgment  decision  by 
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the  team.    They  felt  the  tasks  for  area  V  were  more  appropriate  in  other  areas. 
The  328X3  project  comparison  is  shown  in  Table  5.    Even  though  the  number  of 

Insert  Table  5  about  here 


tasks  selected  for  the  631X0  and  the  328X3  projects  was  less  than  eleven  percent 
of  the  total  number  of  job  inventory  tasks,  the  teams  were  able  to  develop  a 
final  computer-derived  outline.    Because  of  the  heterogeneity  of  the  328X3  ca- 
reer field  due  to  the  distinct  differences  in  equipment  from  base  to  base,  the 
team  concluded  that  an  additional  major  outline  area  (III)  was  needed.  The 
new  area  was  on  basic  principles  which  could  be  generalized  across  the  career 
field. 

Table  6  shows  the  relationship  of  the  total  (absolute)  percentage  weight 
difference  between  projects.    There  was  a  general  trend  for  the  percentage 

Insert  Table  6  about  here 


weight  differences  , to  decrease  as  the  total  number  of  tasks  selected  increased 
(JIC=-.71,  p<. 025)1    In  every  case,  the  smallest  percentage  weight  difference 
was  between  the  conventional  outline  and  the  final  computer-derived  outline 
(JIC=.82,  p<.01).    This  includes  the  701X0  project  during  which  the  computer- 
derived  outline  was  developed  independently  of  the  conventional  outline. 

The  attitude  response  from  the  Outline  Questionnaire  is  presented  in  Table  7. 

Insert  Table  7  about  here 


There  were  nearly  twice  as  many  positive  responses  as  negative  (HLC=.57,  p<.005). 
The  response  to  the  last  item  on  the  questionnaire  indicated  that  six  SMSs  would 
choose  the  computer-derived  outline  procedure  over  the  conventional  outline.  The 
remaining  two  SMSs  were  indifferent.    A  comparison  of  the  attitude  toward  the 
occupational  survey  (third  questionnaire  item,  with  the  preferred  outline  de- 
velopment procedure  revealed  a  dichotomy.    All  four  SMSs  on  the  701X0  team  re- 
sponded negatively  to  the  third  item  and  three  of  the  four  on  the  328X3  team 
were  not  sure. 

Discussion 

The  purpose  of  this  study  was  to  evaluate  the  expeVimental  application  of 
the  computer-derived  test  outline  procedure.    Conventional  test  outlines  were 
used  as  a  criterion  reference  during  four  actual  test  development  projects. 
For  one  of  these  projects  only  a  computer-derived  outline  was  developed  inde- 
pendently of  the  conventional  outline  procedure.    For  the  other  three  projects, 

Jenkins  Index  of  Covariation  (Jenkins  &  Hatcher,  1976) 
2.    Hi-Lo  Coefficient  (Davidoff  &  Goheen,  1953) 
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both  types  of  outlines  were  developed.    The  resultant  percentage  weights  for 
major  outline  areas  were  compared.    In  every  case  the  smallest  percentage  weight 
difference  was  between  the  conventional  outline  and  the  final  computer-derived 
outline.    The  differences  were  not  significant  and  the  relationship  correlated 
.82  (p-^.Ol).    Overall,  the  more  homogeneous  the  career  field,  the  larger  the 
numbers  of  occupational  survey  tasks  selected  for  the  initial  computer  product 
(correlation  .71,  p<«025).    In  conjunction  with  this  trend,  the  more  homogeneous 
the  career  field  the  smaller  the  percentage  weight  differences.    This  meant  that 
the  more  homogeneous  the  career  field,  the  closer  the  initial  computer  product 
was  to  reflecting  the  conventional  outline.    This  relationship  was  also  shown 
with  the  computer-derived  outline  developed  independently  of  the  conventional 
outline  procedure.    This  finding  substantiates  an  existing  feeling  in  test  con- 
struction.   This  feeling  is  that  the  problems  involved  in  developing  a  test  de- 
velopment plan  decrease  proportionly  to  the  homogeneity  of  the  career  field.  A 
compensation  for  more  heterogeneous  career  fields  is  to  decrease  the  task  selec- 
tion criterion  from  20  percent  to  about  10  percent  members  performing.  Even 
though  the  additional  tasks  will  be  performed  by  a  small  percentage  of  personnel, 
there  are  usually  basic  principles  that  can  be  generalized  across  the  career  field. 

On  the. Outline  Questionnaire,  there  were  nearly  twice  as  many  positive  re- 
sponses as  negative  (p<.005).    The  response  to  the  last  item  on  the  question- 
naire indicated  that  six  out  of  the  eight  SMSs  would  choose  the  computer-derived 
outline  development  procedure  over  the  conventional.    The  other  two  SMSs  were  in- 
different.   In  comparison,  the  reaction  to  the  occupational  survey  data  indicated 
a  dichotomy.    Four  SMSs  responded  negatively,  three  were  indifferent,  and  one  was 
positive.    This  comparison  indicated  that  even  though  the  SMSs  indicated  reluc- 
tance to  fully  accept  the  occupational  survey  as  a  true  and  complete  picture  of 
their  career  field,  they  recognized  the  advantage  of  using  the  occupational  sur- 
vey in  the  test  development  process.    The  computer-derived  outline  procedure 
caused  the  SMSs  to  become  involved  with  the  occupational  survey  data.    The  SMSs 
admitted  that  the  survey  data  enhanced  their  ability  to  reach  agreement  on  test 
content. 

The  four  test  construction  teams  felt  that  the  computer  product  they 
used  was  easy  enough  to  follow.    They  agreed  that  the  Specialty  Training  Stan- 
dard (STS)  order  was  the  logical  format.    Although  an  additional  table  was  fur- 
nished the  team  to  assist  them  in  converting  testing  importance  values  into  the 
suggested  number  of  test  questions  for  each  task,  this  step  was  still  too  com- 
plex.   The  conversion  needs  to  be  incorporated  into  the  computer  program.  Even 
then  there  will  still  be  the  need  for  the  human  element,  i.e.,  the  team's  eval- 
uation of  the  computer  product. 

Every  team  felt  a  need  to  readjust  the  tasks  shown  on  the  initial  computer 
product  and  the  resultant  percentage  weights  for  the  major  outline  areas.  This 
step  was  the  most  complicated  with  the  more  heterogeneous  career  fields.  Yet, 
even  witTi  these  adjustments,  the  relative  time  required  to  develop  the  computer- 
derived  outline,  ranging  from  one-half  day  to  a  day  and  a  half,  is  no  longer 
than  the  time  used  for  development  of  the  conventional  outline.    The  incorpora- 
tion into  the  computer  program  of  the  conversion  from  testing  importance  values 
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to  actual  outline  weights  for  each  task  will  probably  decrease  the  amount  of 
time  required  for  outline  development. 


Four  factors  play  a  key  role  in  the  feasibility  of  fully  implementing  the 
computer-derived  outline  procedure.    The  first  is  currency  of  the  occupational 
survey  data,  i.e.,  does  the  occupational  survey  depict  tho  current  career  field. 
The  present  surveying  operation  is  closer  to  keeping  up  with  career  field  changes 
than  ever  before.    The  second  is  timeliness  of  computer  related  support.  Test 
development  schedules  are  firm,  so  the  necessary  computer  product  must  be  avail- 
able at  the  beginning  of  the  project.    Once  the  need  is  confirmed,  it  is  possible 
to  prepare  the  computer  product  well  in  advance  of  a  project.    The  third  is  the 
workload  on  personnel,  i.e.,  being  able  to  do  the  job  within  existing  resources. 
Existing  test  support  activities  should  be  evaluated  to  determine  how  the  pres- 
ent support  procedures  could  be  altered  to  fit  the  new  outline  development  pro- 
cedures without  increasing  the  workload  on  personnel.    The  fourth  factor  is 
the  SMS  attitude  toward  occupational  survey  data.    The  briefing  to  the  SMSs 
about  the  survey  should  include  a  discussion  of  quality  control  measures  taken 
by  the  occupational  survey  activity  to  insure  valid  data.    Also,  steps  should 
be  taken  to  insure  that  each  person  completing  a  job  inventory  for  occupational 
survey  understands  the  importance  of  accurate  responses. 

Since  each  test  development  team  felt  the  need  to  readjust  task  distribution 
and  percentage  weights  on  the  initial  computer  product  for  their  final  outline, 
there  is  a  need  for  further  refinement  of  the  computer-derived  outline  procedure. 
The  evaluation  should  include  further  validation  of  the  formula  used  to  compute 
the  testing  importance  values. 

As  a  result  of  this  study  it  can  be  concluded  that  the  computer-derived  out- 
line procedure  is  viable  for  test  construction.    The  computer  product  is  in  a 
format  that  is  agreeable  to  the  SMSs  that  have  used  it.    The  procedure  for  using 
the  computer  product  can  be  followed  even  by  the  individual  who  is  not  acquainted 
with  occupational  survey  data.    The  final  computer-derived  outline  is  not  sig- 
nificantly different  from  the  time- tested  conventional  outline.    However,  the 
computer-derived  outline  does  directly  incorporate  occupational  survey  data  into 
test  development  procedures.    The  incorporation  of  occupational  survey  data  ex- 
pands the  input  for  test  outline  development  from  four  SMSs  to  the  field  of  sur- 
vey respondents.    This  expansion  strengthens  the  content  validity  position  of 
the  resultant  test.    The  strength  of  the  computer-derived  outline  along  with  the 
feasibility  of  the  procedure  shown  in  this  study  indicate  that  the  Occupational 
Measurement  Center  should  proceed  to  incrementally  implement  the  computer-derived 
outline  procedure  with  concurrent  evaluation. 
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OUTLINE  QUESTIONNAIRE 
Indicate  the  degree  to  which  you  agree  with  the  following  statements: 

Agree  Disagree 

 feel  the  computer-derived  outline  was 

easier  to  develop  than  the  conventional 

outline.  A     B     C     D  t 

^       In  comparison  with  the  conventional  outline 

~l  feel  the  computer-derived  outline  more 


accurately  reflects  the  true  job  situation  _ 
in  the  field.  A     B     C  D 

I  have  confidence  that  the  survey  data  used 
"to  compile  the  computer  outline  is  accurate 
and  dependable.  A     B     C  D 

I  feel  the  format  of  the  computer  outline 

"is  difficult  to  understand.  A     B     C  D 

I  found  the  computer  outline  product  very 
"easy  to  work  with.  A     B     C  D 

I  feel  that  SKT-usable  references  are 
available  for  all  areas  listed  in  the 

computer  outline.  A     B     C  D 

_I  found  that  the  computer  product,  as 

printed,  sufficiently  covered  all  STS  «     d     r  n 

areas.  A     B     C  D 


I  found  that  a  substantial  amount  of 
"information  had  to  be  added  before  the 
computer  product  could  be  used  as  a 
test  outline.  ^  ° 

Given  a  choice,  I  would  prefer  to  develop 
"an  outline  from  the  computer  product  rather 
than  use  the  conventional  method.  A  B 

Figure  3.    Outline  Questionnaire 
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Table  1 

Conversion  Table  for  Determining  Equivalent  Percentage  Weights 
frcm  328X3  Testing  Importance  Value 


Testing  Importance  Percentage  Weight 

Value  5-Skill  Level        7-Skill  Level 


7.5  5.3  8.3 

7.0  4.3  5  7.7 

6.5  4.6  7.2 

6.0  4.2  6.6 

5.5  3,8  4  6.1 

5.0  3.5  5.5 

4.5  3.2  5.0 

4.0  2.8  3  4.4 

3.5  2.5  3.9 

3.0  2,1  3.3 

2 

2.5  1.8  2.8 

2.0  1.4  2.2 

1 

1.5  1.1  1.7 


Total  Testing  Imp  Value  142.59  90.53 


8 


7 


6 


2 
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Table  2 


^*^*v''6X0F  Percentage  Weight  Comparison  of  Conventional  Outline 
with  Computer-Derived  Outline* 


Skill  Major  Conventional     Computer-Derived  Perceatage  Weight 

Level  Outline  Outline  %  Outline  %  Differences 


Area 

(A) 

(B=Initial) 

(C=Final) 

(B-A) 

(C-A) 

(C-B] 

5 

I 

46 

32 

48 

-14 

2 

16 

II 

20 

21 

9 

1 

-11 

-12 

III 

34 

47 

43 

13 

_± 

-  4 

Total 

(Absolute) 

100 

ICQ 

100 

28 

22 

32 

7 

I 

46 

32 

49 

-14 

3 

17 

II 

20 

22 

r 

9 

2 

-11 

-13 

III 

34 

46 

42 

12 

 8 

-  4 

Total 

(Absolute) 

100 

100 

100 

28 

22 

34 

*Total  number  job  inventory  tasks:  783 
Total  number  tasks  selected  5-level:  143 
Total  number  tasks  selected  7-level:  140 
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Table  3 

701X0  Percentage  Weight  Comparison  of  Conventional  Outline 
with  Computer-Derived  Outline* 


Skill 

net  J 

1 V  CI  1  i.  X      1  CI  X 

Coraputer-Der  ived 

Percentage  Weight 

Level 

Outline 

Outline  % 

Oil  t*  1  4  no 

% 

Differences 

I"/ 

(B=Initial) 

(C=Final) 

(B-A) 

(C-A) 

(C-B) 

5 

I 

12 

18 

25 

6 

13 

7 

TT 

2 

2 

_  1 

_  1 

0 

TTT** 

XXX 

Q 

37 

10 

28 

1 

-27 

TV 

19 

30 

-15 

X 

-  4 

n 

XT 
V 

*u 

24 

33 

-16 

_  7 

9 

vx 

z 

0 

 0 

_  2 

_  2 

0 

\  AUSOXU  UC?/ 

1  on 

XUw 

100 

100 

68 

28 

54 

7 

I 

26 

24 

30 

-  2 

4 

6 

II 

1 

1 

0 

0 

-  1 

-  1 

III** 

4 

44 

5 

40 

1 

-39 

IV 

40 

23 

35 

-17 

-  5 

12 

V 

27 

8 

30 

-19 

3 

22 

VI 

 2 

0 

 0 

-  2 

-  2 

 0 

Total 

(Absolute) 

100 

100 

100 

80 

16 

80 

*Total  number  job  Inventory  tasks:  216 
Total  number  tasks  ^selected  5-level:  62 
Total  number  tasks  selected  7-level:  48 
**AltIioi2gh  high  percent  members  peri.ormlng,  low  number  of  tasks  could  be  referenced 
to  the  Career  Development  Course  (CDC) 

i 
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Table  4 


631X0  Percentage  Weight  Comparison  of  Conventional  Outline 
with  Computv:r-Der  ived  Outline* 


Skill  Major  Conventional     Computer-Derived  Percentage  Weight 

Level  Outline  Outline  %  Outline  %  Differences 


Area 

(A) 

(B=Initial) 

(C=Final) 

(B-A) 

(C-A) 

(C-B) 

5 

I 

52 

29 

46 

-23 

-  6 

17 

II 

25 

21 

31 

-  4 

6 

10 

III 

13 

13 

•> 

/  17 

0 

4 

4 

IV 

10 

29 

6 

19 

-  4 

-23 

V** 

0 

8 

0 

8 

0 

-  8 

Total 

(Absolute) 

100 

100 

100 

54 

20 

62 

7 

I 

51 

32 

44 

-19 

-  7 

12 

II 

26 

19 

33 

-  7 

7 

14 

III 

15 

15 

20 

0 

5 

5 

IV 

8 

18 

3 

10 

-  5 

-16 

V** 

0 

16 

0 

16 

0 

-16 

Total 

(Absolute) 

100 

100 

100 

52 

24 

62 

♦Total  number  job  inventory  tasks:  374 
Total  nuiBber  tasks  selected  J^level;  38 
Total  number  tasks  selected  7-level:  26 
**Team  felt  tasks  were  more  appropriate  under  a  separate  heading 
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Table  5 

328x3  Percentage  Weight  Comparison  of  Conventional  Outline 
with  Computer -Derived  Outline* 


Skill 
Level 


Major  Conventional  Computer-Derived 

Outline  Outline  %  Outline  % 


Percentage  Weight 
Differences 


Total  number  tasks  selected  5- level:  27 
Total  number  tasks  selected  7-"level:  17 
**Team  desired  a  basic  principles  area 


Area 

(A) 

{B=Initial) 

(C=Pinal) 

(B-A) 

(C-A) 

(C-B) 

5 

I 

4 

21 

14 

17 

10 

-  7 

79 

37 

47 

5 

-42 

III** 

64 

U 

49 

-64 

49 

Total 

(Absolute) 

100 

100 

100 

128 

30 

98 

7 

I 

11 

24 

24 

13 

13 

0 

II 

39 

76 

48 

37 

9 

-28 

III*** 

50 

 0 

28 

-50 

-22 

28 

Total 

(Absolute) 

100 

100 

100 

100 

44 

56 

*rctal 

number  job  inventory  tasks:  76b 
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Table  6 

Comparison  By  Project  of  Total  (Absolute)  Percentage  Weight  Differences 
Between  Conventional  Outline  (A)  and  Computer- 
Derived  Outline  (B=lnitial;  C^Final)* 


5-Skill  Level  7-Skill  Level 

Percentage  Weight  Difference    Total  #    Percentage  Weight  Difference  Total  # 
(B-A)    (C-A)   (C-B)                Tasks             (B-A)    (C-A)    (C-B)  Tasks 

Selected  Selected 


316X0F 

28 

22 

32 

142 

28 

22 

34 

140 

701X0 

68 

28 

54 

62 

80 

16 

80 

48 

631X0 

54 

20 

62 

38 

52 

24 

62 

26 

323X3 

128 

30 

98 

27 

100 

44 

36 

17 

Range  of 

Percentage 

46 

Difference 

100 

10 

66 

72 

28 

i 


*C-A  differences    not  significant.      Gei.arally  as  total  tasks  selected  increases,  per- 
centage   weight  differences    decrease  (correlation    -.71,  p<.01).    Smallest  percentage 
weight  difference  C-A  (correlation  .82,  p<*01). 


EKLC 


Table  7 


Positive  (+),  Indifferent  (0),  and  Negative  (-)  Attitudes 
Toward  the  Computer-Derived  Outline  Procedures  Based 
on  Responses  to  the  Outline  Questionnaire* 


Attitude 

Test  Development  Project 

Response 

328X3 

701X0 

Sum 

+ 

21 

10 

31 

0 

10 

14 

24 

mm 

_5 

12 

17 

Sum 

36 

36 

72 

*Signif i!;ant  difference  between  positive  and  negative  (p<.005)  . 


A  Generalization  of  Sequential  Analysis 
to  Decision  Making  with  Tailored  Testing 

by 

Mark  D.  Reckase 
University  of  Missouri-Columbia 


During  the  last  decade,   'here  has  been  increasing  interest  in  the 
individualizati-  di  of  instruction  and  the  maintenance  of  high  standards 
of  quality  in  the  students  graduated  from  instructional  programs.  Both 
Individualization  and  the  maintenance  of  quality  require  achievement 
measurement  procedures  that  can  accurately  determine  whether  a  student 
is  above  or  below  a  pre-set  criterion  score.    Also,  the  relatively  new 
areas  of  criterion-referenced  measurement  and  mastery  learning  programs 
require  accurate  procedures  for  classification  into  the  two  groups 
(pass  and  fail)  for  their  operation.    If  the  classification  can  be  done 
quickly  with  only  a  few  test  items,  this  would  be  a  desirable  attribute 
for  a  procedure. 

Most  decision  procedures  described  in  the  current  literature  are 
based  on  sampling  a  fixed  number  of  test  items  for  a  domain  and  using 
either  classical  or  Bayesian  decision  rules  for  determining  a  person  s 
position  relative  to  a  criterion  (see  Millman,  1974  and  Hambleton, 
Swaminathan,  Algina  and  Coalson,  1978  for  reviews  of  these  techniques). 
However,  a  family  of  procedures  exists  that  has  been  shown  to  yield 
a  smaller  expected  sample  size  for  testing  many  hypotheses  while  hold- 
ing the  power  of  the  test  at  the  same  level  as  the  fixed  sample  size 
procedures  (Wald,  1947).    These  are  sequential  procedures  that  have 
the  characteristics  of  tal  ing  single  observations  and  deciding  after 
each  observation  if  a  classification  should  be  made  or  if  more  infor- 
mation is  needed—that  is,  if  another  observation  should  be  taken. 
For  many  classifications,  sequential  procedures  have  been  proven  to 
be  much  more  efficient  than  fixed  sample  size  procedures  by  exhibiting 
high  accuracy  with  relatively  small  sample  sizes  (Wald,  1947), 

A  simple  example  will  be  used  to  show  how  the  number  of  test  items 
used  for  classifying  a  student  can  be  reduced  while  the  accuracy  of 
classification  stays  the  same  as  for  a  full  length  test.    Suppose  a 
student  who  has  not  mastered  the  material  from  a  unit  of  instruction 
is  given  a  ten  item  quiz  for  the  purpose  of  diagnosing  that  fact. 
Suppose  further  that  an  80%  criterion  has  been  set  for  success.  The 
usual  procedure  would  be  to  give  the  ten  item  quiz,  score  it,  and  if 
the  score  were  seven  or  less,  give  remedial  instruction,    l^hen  using 
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was  supported  by  contract  number  N00014-77-C-0097  from  the  Personnel 
and  Training  Research  Programs  of  the  Office  of  Naval  Research. 


a  sequential  procedure,  items  wotild  be  administered  one  at  a  time  and 
testing  would  stop  as  soon  as  three  items  were  missed •    The  largest 
nuinber  of  items  administered  would  be  ten,  so  the  average  number  adminis 
tered  must  be  less  then  ten,  but  the  same  classification  criterion  has 
been  used  for  both  testing  procedures. 

A  particular  sequential  procedure  that  has  been  applied  to  measure- 
ment decision  problems  in  the  past  and  that  has  shown  promise  for  the 
future  Is  the  sequential  probability  ratio  test  (SPRT)  developed  by 
Wald  (1947).    This  procedure  has  been  thoroughly  analysed  within  the 
mathematical  statistics  framework  (Govindarajulu,  1975)  and  has  recently 
been  rediscovered  by  measurement  theorists  (Sixtl,  1974;  Epstein,  1978). 
In  this  paper  the  SPRT  will  be  generalized  to  tailored  testing  applica- 
tions.   However,  a  brief  description  of  this  sequential  decision  model 
will  be  given  first. 

The  Sequential  Probability  Ratio  Test  (SPRT) 

The  sequential  probability  ratio  test  was  originally  developed 
to  determine  which  of  two  population  parameter  values  is  most  likely 
true  for  a  given  set  of  data.    For  example,  one  might  be  interested 
in  determining  whether  the  proportion  failing  a  criterion-referenced 
test  Is  more  likely  .5  or  .8.     If  a  certain  three  of  five  students 
sampled  from  a  population  fail  to  exceed  a  criteria    ,  this  event  would 
have  a  probability  of  .55  =  .03125  if  the  .5  hypothesis  were  correct 
and  .8  X  .8  X  .8  X  .2  X  .2  =  .02048  if  the  .8  hypothesis  were  correct. 
The  question  now  becomes  whether  the  difference  in  these  two  probabil- 
ities is  great  enough  to  select  the  .5  hypothesis  over  the  .8  hypothesis 

To  make  this  decision,  Wald  took  the  ratio  of  the  two  probabilities 

'n^?t^  «  .69526.    If  the  ratio  were  sufficiently  larger  than  1.0,  the 

.8  probability  would  be  accepted  as  correct.    If  it  were  much  smaller 
than  1.0,  the  15  probability  would  be  considered  as  correct.  Note 
that  for  the  sequential  procedure,  this  ratio  would  be  computed  after 
each  observation,  and  a  decision  concerning  the  .5  or  .8  parameter  would 
be  made  as  soon  as  the  ratio  passed  either  an  upper  or  lower  cutoff 
value. 

To  totally  specify  the  SPRT  procedure  some  means  must  be  given 
to  determine  the  two  cutoff  values,  Qq  and  01.    These  cutoffs  are 
directly  dependent  on  the  error  ratss  that  are  deemed  acceptable  in 
choosing  between  the  two  parameter  ViUues.    The  probability  of  choosing 
.8  when  .5  is  true  is  defined  as  an  a  error  and  the  probability  of 
choosing  .5  when  .8  is  true  is  defined  as  a  8  error.    Wald  has  shown 
that  these  error  rates  will  he  at  least  as  low  as  the  values  of  a  ant 
3  when  the  two  cut"  'ff  values  are  set  at: 
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lower  cutoff  *  B  =  -^^  ^ 

a 

S 

pper  cutoff     A  »  -tz  r- 

(1  -  a) 

If  o  is  set  at  .02  and  6  at  .1  the  cutoff  values  are  A  «  45  and  B  " 
.102.    If  after  each  observation  the  ratio  of  the  probabilities  is 
more  extreme  than  either  test  value,  the  appropriate  parameter  value 
is  accepted  as  true.    If  it  is  between  the  two  cutoffs  another  observa- 
tion is  taken. 

Testing  tha  hypothesis  that  .5  is  correct  against  the  hypothesis 
that  .8  is  correct  is  seldom  of  interest  in  a  criterion  referenced 
testing  setting.    A  more  common    hypothesis  is  that  a  person  is  below 
a  cutoff  value  as  opposed  to  being  above  the  value.    Wald  has  shown 
that  this  complex  hypothesis  can  be  tested  in  the  same  way  as  the  two 
simple  hypotheses  by  selecting  a  cutoff  value  and  then  specifying  a 
region  of  indifference  around  the  cutoff  in  which  the  classification 
as  to  above  or  below  the  cutoff  is  equally  good.    The  lower  end  of  the 
Indifference  region  is  used  as  the  lower  simple  hypothesis,  Hqj  while 
the  upper  end  of  the  region  is  used  as  the  upper  simple  hypothesis. 
Hi.    The  A  and  B  values  used  in  the  significance  test  are  determined 
In  the  same  manner  as  above. 


An  example  of  testing  this  type  of  complex  hypothesis  in  a  criterion- 
referenced  testing  situation  can  be  given  as  follows.    Suppose  we  want 
to  determine  if  a  student  can  answer  90%  of  the  items  in  an  item  domain. 
Suppose  also  that  we  are  indifferent  as  to  whether  they  are  classified 
as  high  or  low  in  the  region  from  89%  to  95%.  •  We  would  then  randomly 
select  items  one  at  a  time  from  the  domain  and  determine  the  probability 
of  the  response  strings  under  the  Hq:  tt  =.89  and  Hj:  tt  «.95.    The  ratio 
of  the  probabilities  of  the  response  strings  would  be  computed  as  pre- 
viously described  and  then  compared  to  the  A  and  B  decision  values. 
If  the  ratio  were  below  the  B  value,  the  person  would  be  cl«  ,;sif ied 
as  below  the  criterion;  if  it  were  above  the  A  value,  the  person  would 
be  classified  as  above  the  criterion.    If  the  ratio  were  between  A 
and  B,  another  item  would  be  administered. 

Note  that  in  this  example  items  were  randomly  sampled  one  at  a 
time  without  replacement  and  then  administered.    This  is  called  a 
sequential  random  sample  and  it  is  one  of  the  basic  assumptions  used 
In  deriving  the  method. 

Description  of  the  Characteristics  of  SPRT 

When  using  a  SPRT  for  decision  making,  two  functions  are  derived 
to  describe  the  accuracy  and  efficiency  of  the  procedure.    The  first 
is  called  the  operating  characteristic  (DC)  function  of  the  test. 
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The  OC  function  gives  the  probability  of  accepting  the  null  hypothesis 
as  a  function  of  the  unknown  parameter  of  interest.  0.    For  criterion 
referenced  testing,  the  null  hypothesis  is  usually  that  the  exaiiiinee 
is  below  the  criterion.    Typically  the  plot  of  this  function 
is  aa  S-sha?fcd  curve  asymptoting  at  1.0  one  the  left  and  0.0 
on  the  right  (see  Figure  1).    Wald  has  shown  that  at  the  lower  simple 
hypothesis  value,  Gq,  the  curve  will  have  a  height  of  1  -  a,  while  at 
the  upper  critical  value,  01,  the  height  is  3.    The  slope  of  the  func- 
tion between  these  two  points  is  dependent  on  the  width  of  the  indifference 
region — the  wider  it  is  ,  the  flatter  the  slope.    Finally,  the  point  of 
inflection  of  the  curve  is  usually  near  the  decision  point.    An  ideal 
OC  curve  would  approximate  a  step  function,  dropping  abruptly  from  a 
probability  of  1.0  of  accepting  the  null  hypothesis  below  the  decision 
criterion  to  a  probability  of  0.0  of  accepting  the  null  hypothesis 
immediately  above  the  criterion. 


Insert  Figure  1  about  here 


The  second  function  used  to  evaluate  the  operation  of  theSPRT.. 
decision  rule  is  the  average  sample  number  (ASN)  function.  This 
ASN  function  gives  the  average  number  of  observations  required  to 
make  a  decision  as  a' function  of  the  variable  used  to  make  the  decision. 
This  function  typically  plots  as  a  unimodal  curve  with  its  mode  near 
the  decision  point  (see  Figure  1).    The  curve  asymptotes  to  zero 
in  either  direction  from  the  mode.    Since  this  function  gives  an 
indication  of  how  many  observations  are  required  for  a  decision,  the 
lower  the  modal  value  the  better.    That  is,  corresponding  to  a  given 
DC  function,  we  would  like  the  ASN  function  to  be  as  low  as  possible 
throughout  the  variable  range,  indicating  that  only  a  few  observations 
are  required.    Another  desirable  feature  for  an  ASN  function  is  a 
quick  decline  from  the  mode,  indicating  that  decisions  require  few 
observations  if  a  person  is  not  near  the  decision  point. 

The  magnitudes  of  the  values  of  these  two  functions  are  related 
to  each  other.    As  the  slope  of  the  OC  function  increases,  the  values 
of  the  ASN  function  will  usually  increase.     If  a  flatter  OC  is  accept- 
alU.e,  the  values  of  the  ASN  function  will  be  less.    In  using  a  SPRT, 
a  compromise  must  be  reached  between  precision  (as  shown  by  the  OC 
curve)  and  sample  size  (as  shown  by  the  ASN  function).    Both  of  these 
functions  will  be  used  to  evaluate  the  SPRT  for  use  with  tailored 
testing. 

Generalization  of  the  SPRT  to  Tailored  Testing 

As  mentioned  above,  the  SPRT  as  developed  by  Wald  assumes  that 
observations  are  taken  using  a  sequential  random  sample.    In  a  criterion- 
referenced  test,  this  would  mean  that  items  would  be  selected  and 
administered  at  random  one  at  a  time  from  a  domain  of  items.    Although  iQO^ 
random  sampling  is  philosophically  acceptable  with  criterion-referenced 
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testing,  it  is  at  odds  with  the  purposes  of  tailored  testing*    In  this 
latter  case,  the  purpose  is  to  select  items  to  match  the  abilities  of 
each  pupil  rather  than  to  administer     random  selection.    As  a  result 
of  Batching  items  to  pupils,  the  testing  situation  should  be  more 
efficient  and  accurate.    Since  the  purpose  of  this  paper  is  to  merge 
the  SPRT  procedure  with  tailored  testing,  an  i^ritial  task  is  to  deter- 
mine whether  the  sequential  random  sample  assumption  is  really  necessary. 

A  detailed  analysis  of  Wald's  (19A7)  work  by  the  present  ir;uthor 
Indicates  that  the  assumption  was  only  needed  to  make  the  derivation 
of  the  OC  and  ASN  functions  possible.    Without  the  assumption,  the 
characteristics  of  each  item  must  be  specified,  resulting  in  many 
nuisance  parameters  that  cannot  be  eliminated.    However,  the  test 
statistics  still  operate  in  the  same  way,  so  the  procedure  can  still 
be  used.    The  OC  and  ASN  functions  will  be  developed  using  slmilations 
in  this  paper  since  they  cannot  be  developed  using  the  usual  formulas. 
An  example  will  now  be  given  showing  the  use  of  the  SPRT  with  tailored 
testing. 

Suppose  it  is  desirable  to  determine  whether  a  student's  performance 
on  a  module  of  instruction  is  above  or  below  a  pre-set  criterion  score. 
Since  the  origin  of  the  latent  trait  ability  scale  is  arbitrary,  the 
criterion  score  can  be  set  at  0.0  without  loss  of  generality.  An 
Indifference  region  must  now  be  specified  for  this  criterion  score. 
Assume  that  ability  estimates  in  the  region  around  0.0  have  been  found 
to  have  a  standard  error  of  .3  for  the  population  and  item  pool  of 
interest.    Therefore,  the  indifference  region  will  be  specified  as 
-.3  to  +.3,  and  0©  =^  -•^  and  Oi  «  +.3  are  used  for  the  SPRT. 

Next,  the  acceptable  error  rates  for  the  classification  decision 
must  be  specified.    For  this  decision  suppose  it  was  felt  to  be  a  more 
serious  error  to  classify  a  person  above  the  criterion  score  when  they 
should  have  been  classified  low,  than  to  classify  below  when  they  should 
have  been  above.    Therefore  a  was  set  at  .02  and  3  at  .1  and  two  class- 
ification values  for  the  SPRT  would  then  be  A  =  45  and  B  =  .102. 

With  the  specification  of  this  preliminary  information,  the  oper- 
ation of  the  SPRT  can  begin.    When  no  previous  information  is  availi'.ble 
about  a  student, the  tailored  testing  procedure  first  administers  an 
an  item  of  moderate  difficulty.    Using  a  one  parameter  logistic  model, 
this  first  item  has  a  difficulty  value,  b,  of  0.0.    Suppose  the 
student  gives  a  correct  response  to  this  item.    The  probability  of 
this  response  under  Oq  =  --S  is  given  by 
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where  pni  Is  the  probability  of  the  response  after  one  item  under  Hq, 
and  the  formila  is  that  of  the  one-parameter  logistic  model.    The  prob- 
ability under  Gi  -  .3  is  given  by 

(6    -  b  )  g(.3  -  0) 

where  pu  is  the  probability  of  the  response  after  one  item  under 
The  value  of  the  SPRT  is  given  by 

Poi  -^26 

Since  this  value  is  between'A.  and  B,  no  decision  can  be  made  and  another 
item  should  administered.    Since  the  first  item  was  responded  to  correctly, 
a  more  difficult  item  will  now  be  administered  to  try  to  match  the 
person's  ability,  say  an  item  of  +.7  difficulty.    If  an  incorrect  response 
is  obtained  to  this  second  item,  the  probability  of  the  1,  0  response 
string  under  Qq  is 

2 
i=l 

Pq2  =  .426  X  .731  ^  -341 

where  P02  is  the  probability  of  the  response  string  after  two  items, 
given  Qa',  Pj  Oo)  is  the  probability  of  a  correct  response  to  Item  i, 
given  0o;  QiOo)  is  the  probability  of  an  incorrect  response,  and  Xi 
is  the  response  to  Item  i,  (0  or  i) . 

Under  0i,  the  probability  of  the  response  string  is 
2 

p.,  -  n  P  (0  )^i  Q,(0i)^  ■  ""i 
1=1 

-  .574  X  .401  =  .230 
The  SPRT  is  then  equal  to 

!l2  „^230  ^ 
P02  -341 

Since  this  value  is  still  between  A  and  3,  no  decision  can  be  made  and 
a  third  item  should  be  administered.    The  procedure  would  then  continue 
in  the  same  way  until  the  ratio  is  more  -  ^--e  than  A  or  B.    At  that 


point,  the  appropriate  decision  would  be  made  and  testing  would  stop. 
In  theory  a  very  large  number  of  items  could  be  administered  before 
a  decision  is  made — although  Wald  has  proven  that  the  number  is  finite. 
However »  la  practice  some  reasonable  upper  limit  is  set  on  the  number 
of  items  administered,  20  for  Bxampl?,  and  a  decision  is  made  after 
the  twenty  items  on  the  basis  of  whether  the  probability  ratio  is  above 
or  below  l.O.    This  procedure  Is  called  a  truncated  SPRT. 

As  luentioned  earlier,  one  of  the  assumptions  of  the  SPRT  is  a 
sequential  random  sample.    Since  that  assumption  is  not  met,  and  also 
since  in  real  situations,  the  procedure  may  be  truncated,  it  is  Impossible 
to  derive  the  ASN  and  OC  functions*    Therefore,  the  laajor  purpose  of 
this  paper  was  to  determine  these  functions  through  simulations  and  use 
this  Information  to  evaluate  the  procedures  for  use  with  criterion- 
referenced  tailored  testings 

Method 

The  OC  and  ASN  functions  were  determined  for  tailored  tests  using 
both  the  one-  and  three-parameter  logistic  mc^i^els  based  on  maximum 
likelihood  estimation  of  ab51ities.    Tl:e  three-parameter  logistic  model 
is  an  eTrconsion  of  the  one-parameter  model  that  includes  discrimination 
and  guessing  parameters  (See  Lord  rmd  Novick,  1968,  for  further  infor-  . 
mation).    Simulations  were  used  in  both  r.a3es.    The  tailored  testing 
procedures  used  have  been  described  in  detail  by  Koch  and  Reckase  (1978), 
80  they  will  not  be  described  again  here.    However,  to  distinguish  the 
techniques  from  other  procedures,  it  should  be  stated  that  the  .^procedures 
begin  with  an  item  of  average  difficulty  and  operate  on  a  fixed  step- 
size  procedure  until  a  correct  and  inccrrect  response  is  present.  At 
that  point  a  maximum  likelihood  ability  estimate  is  obtained  and  the 
next  item  is  selected  to  yield  maximum  information  for  that  ability 
estimate.    The  procedures  tarminate  when  appropriate  items  are  no  longer 
available  or  if  twenty  items  have  been  administered,  whichever  occurs 
first. 

The  simulated  tailored  tes':ing  procedure's  were  identical  to  those 
described  above,  except  that  a  random  number  generator  replaced  the 
human  examinee.    At  the  beginning  of  each  simulation  run  the  true  ability 
of  th2  simulated  examinee  was  input  into  the  program.    This  value  was 
used  to  determine  the  true  probability  of  a  correct  response  to  the 
administered  items  based  on  the  model  used,  (one-  or  three-parameter 
logistic)  and  the  estimated  item  parameters,    A  number  was  then  randomly 
selected  from  a  uniform  distribution  on  the  range  from  0  to  1.    If  the 
selected  number  was  less  than  the  probability  of  a  correct  response,  a 
correct  response  was  recorded;  otherwise  a~i  incorrect  response  was  assigned. 
This  procedure  continued  for  each  item  in  the  tailored  test. 

Tailored  tests  were  administered  twenty-five  times  at  each  true 
ability  using  different  seed  numbers  for  the  random  number  generator. 


True  abilities  from  -3  to  +3  at  ,25  intervals  were  used  for  the  one- 
and  three-parameter  models  to  evaluate  the  SPRT,    Indifference  regions 
of  +»3,  +.8,  and  +1  were  used  in  the  evaluation.    All  simulations  used 
the  item  parameters  from  a  pool  of  72  vocabulary  items.    This  item  pool 
had  an  approximately  normal  distribution  of  difficulty  parameters. 

During  the  administration  of  the  tailored  tests,  probability  ratios 
were  computed  after  each  item  was  administered.    A  decision  was  made 
to  classify  a  person  above  or  below  the  cutoff  by  comparing  the  SPRT 
value  to  an  A  value  of  45  and  a  B  value  of  .102,  determined  using  a  * 
•02  and  3  ■  .10.    A  classification  was  made  the  first  time  these  limits 
were  exceeded.    If  the  limits  were  not  exceeded  before  the  termination 
of  the  test,  values  above  1.0  were  classified  as  above  and  the  values 
below  1.0  were  classified  as  low.    At  each  true  ability  used  for  the 
simulation,  the  proportion  of  the  25  administrations  classified  low  and 
the  average  number  of  items  administered  were  computed.    Plots  of  these 
values  against  the  true  abilities  approximate  the  OC  and  ASN  functions, 
respectively. 

Results 

The  results  of  this  research  will  be  described  in  two  parts;  one 
for  the  one-parameter  logistic  model,  and  the  other  for  the  three- 
parameter  logistic  model.    The  plots  of  the  OC  and  ASN  functions 
summarize  the  results  of  the  SPRT  for  these  models. 

Figure  2  shows  the  OC  functions  for  the  one-parameter  logistic 
model  based  on  the  vocabulary  item  pool.    The  figure  shows  three  graphs, 
one  for  each  of  the  +.3,  +.8,  and  +1  indifference  regions.    Note  that 
the  curves  are  reasonably  similar  regardless  of  the  indifference  region. 
The  similarity  indicates  that  in  all  four  cases  the  classification 
accuracy  is  nearly  the  same. 

Insert  Figure  2  about  here" 

The  values  of  the  curves  at  the  limits  of  the  indifference  region  give 
further  evaluative  information.    At  the  lower  point,  the  OC  function 
should  pass  through  1  -  a.    At  the  -.3  value,  the  curve  is  in  fact  .85 
when  it  should  be  .98  showing  the  degrading  effects  of  restrictive  stopping 
rules  used  by  the  tailored  tevSting  procedure.    At  the  -.8  and  -1  points 
for  the  corresponding  curves,  the  results  are  about  as  expected,  being 
.94  and  1.00  rather  than  .98. 

At  the  upper  limit  of  the  inc  If ference  region  the  OC  function 
should  have  a  value  of  .1.    For  the  ,3  case  it  is  in  fact  .5  rather 
the  .1,  again  showing  the  effects  of  truncating  the  procedure.    At  the 
values  of  .8  and  1,  the  values  of  'rhe  OC  function  were  near  or  better 
than  what  they  should  have  been  b.    ed  on  the  theoretically  expected 
results. 
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The  ASN  functions  iior  the  one-parasaifitsEr  model  are  given  in  Figure 
3.    The  curves  plotted  xiix~  espPnd  to  tbe  AST   functicjss  using  indifference 
regions  for  i»3,  +.8,  and  tt  can  famnF^rivately  b^seen  from  the  graph 

that  there  is  a  substantial  difference       rrie  average  number  of  items 
needed  to  reach  a  decxsion,  wick  the  greyest  number  required  when  the 
indifference  rregion  is  narrowest.    It  can  xC^o  be  seem  that  the  largest 
expected  nwib&:  of  itcams  is  near  tthe  dec^^izsi  point  xsf  0.0  and  that  the 
average  nudbftsr  drops  iyfi  at:  tgie  eater eme  c^fflul-lties,    lihe  slight  lack 
of  symmetry  M\  the  currves  u.j5  due  tto  the  ffttitt  that  a  ^«s  not  equal  to  3. 
Tor  abllitiiiS^  teyond  ^1 »  an  srara^e  of  only  about  3  cd  5  items  was  needed 
for  classi&i  iscion  fa^*  the  wider  regions ,  wiile  6  to  11  were  needed  for 
the  +.3  inctiffHrence  region.    Note  that  tte  +;.3  curve  is  approaching 
the  arbitrsy  twenty  item  limit  for  the  ra'   nred  tests,  possibly  reducing 
its  magnitiAe. 


Lmsert  Figure  3  ==Trout  here 


Figura-  h  aaoRS  "the  theoc^^tical  zwrsss  for  the  ASN  ami  OC  functions 
based  on  the         iiiciiff?;f.ence  region  f or- CTmparison  puc^caes.  An 
infinite  rasnnber  of  irrrfla&  with  difficulty  0.0  was  assumed!  for  the 
theoreticci  functions         th§  tests  were  assumed  to  hax^  no  upper  limit 
on  the  nunAer  of  itracs  ad:  ;jiister«i.    A  comipiEEison  of  Filgures  2  and  3 
with  Figure  4  sSEaias  tisat    he  ^£  eurve  for  tare  theoret-'rs?  function  is 
steeper  at  the  TTtr^^g  ^-rtLr  t\\^  the^imolated  curves  <aasi  rhe  ASN  function 
is  substantiallr  ^§hier.     The  dxffereirce  ±a  the  theorgtica?  and  simulated 
OC  cur*\^23:  shows  r:3r  ^.'-ect  of  Che  uiailorec  t/i-^ting  stam=^  rule. 


JLnsert  Figure  A  ^b:tit  here 


The  results       r:he  sinnulatlcjii  of  the  ^r^ee-parameter  log^istic  tailored 
test  are  given  it  Ji^rr-   5  afd  6.    Figure   5  presents  tne  OC  functions 
for  the  three-parani^ter  muidel ,  ^gazin  usin^  ^'-e  indifference  regions  of 
+.3,  +.8,  and  +1.    Xntice  that,        with  trip  one-parameter  modal,  the 
OC  curves  are  fairlv  simiilar  for  tie  thr^^  Indifference  regions  through- 
out most  of  the  rans£  of  abi  iv/^     Howeveir:;,  there  are  discrepancies  for 
the  +1.C  indifferent:^  tan^^'    urve  zrear  tfe^l  and  -1  points  indicating 
a  decline  in  decisi-T^n  pi'eci  'icni  froF  that  rrsgxon.    At  the  -.3  xalue  for 
the  +.3  indifference  "be  v-_ue  of  rtne  curv^  is  .96,  fsixly  close 

to  the  .98  theoretiiz-LJ.  v.  lue.    At  the  unpaid-  end  (.3),  however,  the  value 
is  .2  instead  of  .1  iH^        sho-  VJ  5^e.    This  ziay  show  the  efferrs  of 
guessing  on  the  declsicn   rroces.v.    The  ^.8  and  +1  indifference  regions 
again  yield  better  error  ,  rob  abj;-^^  ties  ::han  would  be  expectea  from  the 
theory. 

The  ASN  function  fcPr  the  thxee-paxameter  model  (Figure  i)  also  fy^^ 
shows  similar  results  tD  chtt«c  ofexirainetf  from  the  one-parameter  model,  /CSC^ 
The  +.3  indifference  regl^saipiduBed  tiie  greatest  number  of  items, 
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while  +#8  imid  +1.0  re«*.i«  sjout  the  same.    As  before,  the  largest 
nu^er  was  r»nxxred  iiear  zv^  decision  point.    However,  with  the  three- 
pazaneter  «fel,  far  leoas  :ii*Ems  on  the  average  were  required  to  make  a 
deciiKron.    HL  special -nntejis  the  ASH  valns:  of  about  one  in  the  -1  to 
-3  Twmige  am         ahl£:£nz  scale.    Decisions  ^em  to  be  ^^ible  with  very 
£e«r  items  in  rthat  ^^a^^ 


Imert  Figures  5  and  6  about  hera 


atecausB  :a£  tae  gu««ssing  component  of  the  three-pasae^CTr  loijStl:?  Ic 
nodeL>  tiie  AS5B  ^xaizz:^^  tended  to  yield  mDi=  non-symmet=±c  rasulcs 
than  Ae  one-«3raiBfiiHi    jjoiel.    More  iteans  wsre  required^  when  cl^J^'si^x^g 
high  ^fcaan  f or  x  r-'jaastrr        -Low  to  compensate  for  the  non-zeiro  prob„abxLity 
of  a-  jjjgrecr     -spnma^  .  JilLso,  the  ASN  curve  for  +.3  indlfrareince  region 
was        ^  moi»  t^-aab^d  ^rhai  ±ts  one-parameter  counterpart-     If  thf  siiau- 
lated  serves  for  t  e  ditefi^parameter  model  -nre  compared  tc  rfef  '  -.eorrpri  cal 
curv^  Resent-  d  r    Fisstre  4,  the  OC  functions  can  be  seem  Z3>  w-tct 
the  theoretical,  funr^  tms  fairly  closely  while  the  ASN  fmacxfcmB.  shnw 
that  snbstamrially  fc  wee  items  were  required.    Over  much  ol  <ht  abilLity 
range.,  2S  mamy  as  120  :rimes  as  many  items  were  specified  by  icriti  thearetical 
ASN  car  ae  ijfeEsr  jnl  nit^  ^d  identical  items  were  assumed. 


Surmnary  and  Conclusions 

I^the  rre^earcL  ^sented  here,  a  version  of  the  sequ^itial  probability 
ratic  "izrest  ins>«ii±=ied  t  -  operate  within  a  tailored  testing  system  has 
been  &aluatec3  izsing  ..imulation  methods.    A  certain  amount  of  realism 
was  ^^^pted  in  the  Emulation  by  using  latent  trait  item  parameters 
deriw^  froK   che  calibration  of  a  real  pool  of  vocabulair^  items.  Also, 
the  slcmilaticon  carri^  out  the  tailored  testing  within  tsie  limitations 
of  5B  £fnite  Atein  -porl  and  a  twenty-item  maximum  that  was  imposed  in 
ansactual  t^*^Ttlsig  sert±ng  (Koch  and  Reckase,  1978). 

L   ing  L.ae  simulation  data  derived  under  these  circu-^nstances ,  two 
funrrrr  ms  were  estimated,  based  on  either  the  one-  or  thsiBe-parameter 
lofe^.    c  mnc^s,  that  can  be  used  to  evaluate  the  qualitry  of  the  SPRT 
for        isiotr  making  under  tailored  testing.    The  two  fumciions  are 
the    •    and  functions. 

ABMiysus  of  the  OC  functions  obtained  from  the  simixiations,  using 
sevKE^-difxerent  indifference  regions,  yields  three  imporrant  results. 
Firss^,'  :he  curves  are  very  similar  across  the  various  indtSference 
regions    This  probably  indicates  that  not  enough  items  we  -e  available 
for  the^RT  to  fumction  properly  with  the  +.3  indifferenoe  region  to 
take  ads&itage  of  its  theoretically  greater  accuracy.    It  ^^lould  be 
recalled  tiiat  at  most  twenty  items  were  administered,  and  oifiten  less 
than  cthaa^  number  was  used  because  appropriate  items  were  ntfr  available  /{^CP^ 
In         ItiCfi  f^^l*    The  three  parameter  QC^urve^/er^^lJ^^^^teepe^^^^^^^^^ 
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TwREE-  Parameter  ASN  Functsons 
For  Three  Indiffesemce  Remons 
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In  the  middle  ran^  I  llustrating  the  advantage  of  being  able  to  select 
the  most  discriiid2ii«::ag  items  with  thac  model. 


The  second  iiiiinir   1"  that  in  some  cases,  the  curves  did  not  pass 
through  the  points  ^^smined  by  the  pre-set  error  rates.    For  some  of 
these  pases  the  oftaamaBd  errors  of  classification  were  greater  than  the 
expected  ones.    THns::^at  is  also  probably  due  to  the  restrictions 
placed  on  the  nuabMesr  ssf  items  administered.    This  is  demonstrated  by 
the  large  dif feranc£  between  the  theoretical  ASN  curves  and  the  actual 
ones. 

The  third  result  of  interest  dealing  with  OC  curves  is  that  the 
curves  at  the  liMts  of  the  +.8  and  +1  indifference  region  tend  to  give 
better  results  tSbsn  expected  from  the  theoretical  model.    This  is 
probably  due  to  tthe  advantages  accrued  by  selecting  items  using  the 
tailored  testing  algorithm  rather  than  selecting  them  randomly  from 
the  item  pool.    03bv±ously,  more  research  is  needed  to  confirm  these 
conjectures. 

Directly  related  to  the  results  obtained  based  on  the  OC  curves 
are  those  obt£Li2iei  using  the  ASN  curves.    Although  the  OC  curves  were 
similar  across  sidifference  regions,  the  ASN  functions  show  substantial 
differences  in  t±e  number  of  items  administered.    This  fact  implies 
that  the  size  of  the  indifference  region  should  be  determined  by  the 
limits  imposed  by  the  quality  of  the  item  pool  and  the  length  of  the 
testing  session.    Wider  indifference  regions  reduce  the  number  of  items 
required  without  too  much  loss  of  precision  in  the  cases  analyzed  here. 

Also  of  note,  when  comparing  the  ASN  functions,  is  the  substantial 
reduction  in  the  level  of  these  functions  when  proceeding  from  the 
theoretical  curve,  to  the  one-parameter  curves,  to  the  three-parameter 
curves.    This  reduction  is  attributed  to  the  advantages  of  rationally 
selecting  items  as  opposed  to  randomly  selecting  them.    Since  the 
three-parameter  model  has  more  information  to  use  for  selection,  fewer 
items  are  needed  to  reach  a  decision.    This  is  probably  the  most  positive 
finding  •of  this  research  for  criterion-referenced  measurement. 

Two  general  conclusions  can  be  drawn  from  these  results.  First, 
the  SPRT  has  been  shown  to  work  reasonably  well  using  a  tailored  testing 
model.    Some  loss  of  precision  is  present  due  to  the  stopping  rules 
used,  but  the  procedure  seems  viable.    Second,  the  SPRT  when  used  with 
tailored  testing  has  been  shown  to  classify  relative  to  a  cutting  score 
with  amazingly  few  items.    Of  course  this  finding  is  based  on  simulation 
results  rather  than  live  testing,  but  the  promise  of  efficient  and 
accurate  classifications  lends  impetus  for  future  research.  Certainly 
these  findings  should  be  checked  with  live  subjects  to  determine  if  the 
results  are  transferable  to  practical  settings.    However,  based  on  the  / 
information  presented  here,  the  combination  of  tailored  testing  and  / 
the  sequential  probability  ratio  test  should  be  considered  as  promising 
techniques  for  decision  making  in  criterion-referenced  testing. 
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A  METHODOLOGY  TO  EVALUATE  THE  APTITUDE 
REQUIREMENTS  OF  AIR  FORCE  JOBS 
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Air  Force  Human  Resources  Laboratory 
Brooks  AFB,  Texas 


I.  INTRODUCTION 

Aptitude  requirements  for  entry  into  various  Air  Force  career 
ladders  are  presently  determined  in  part  by  the  judgement  of  responsible 
personnel  and  in  part  by  tradition  or  precedent.    A  precise  correspondence 
between  the  aptitude  scores  of  Air  Force  personnel  and  the  aptitude 
requirements  of  Air  Force  jobs  is  extremely  important  since  the  Air 
Force  recruits  a  fixed  amount  of  talent  every  year  and  there  is  more 
demand  for  this  talent  than  one  might  expect.    There  exists  an  additional 
requirement  -for  contingency  plans  should  the  talent  pool  shrink  or 
offer  fewer  highly  talented  individuals.    If  such  shortages  were  to 
occur,  which  specialties  could  tolerate  lower  aptitude  requirements? 
Which  specialties  could  be  shredded  out  into  different  job  types  some 
requiring  high  level  talent  and  some  low  level  talent?    Cost  effectiveness 
enters  the  picture  also.    Even  assuming  the  current  talent  remains 
unchanged,  it  may  be  more  cost  effective  to  shred  some  specialties 
into  jobs  with  varying  aptitude  requirements  because  of  differences 
in  the  actual  tasks  performed. 

More  precise  information  about  aptitude  requirements  will  have 
many  repercussions  for  the  Air  Force  personnel  system,  including  pro- 
curement and  training.    A  decision  to  lower  the  aptitude  entry  level 
for  a  given  specialty  could  have  devastating  effects  on  the  attrition 
rate  for  the  corresponding  training  course  if  no  change  is  made  in 
the  course  curriculum.    For  example,  if  an  electronics  course  was 
designed  for  personnel  with  an  Armed  Services  Vocational  Aptitude 
Battery  (ASVAB)  score  of  E-80  or  better,  the  existing  training  program 
is  very  likely  to  be  too  difficult  for  those  with  lower  aptitudes. 
However,  the  aptitude  level  required  to  be  successful  in  the  training 
course  may  or  may  not  be  the  same  level  required  for  success  in  learning 
how  to  perform  the  job.    It  is  consequently  possible  for  the  Air  Force 
to  waste  talent  by  assigning  high  aptitude  personnel  to  specialties 
that  do  not  require  high  aptitudes;  and  to  frustrate  Air  Force  personnel 
by  assigning  them  to  jobs  that  do  not  fully  utilize  their  talents 
while  simultaneously  neglecting  other  specialties  in  which  talent 
is  urgently  needed. 
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The  Air  Force  Human  Resources  Laboratory  (AFHRL)  has  initiated 
the  first  systematic  study  to  fully  evaluate  the  aptitude  requirements 
of  Air  Force  specialties.    The  approach,  originated  by  Dr.  Raymond 
E.    Christal,  uses  measures  of  learning  difficulty  at  the  task  level 
to  infer  aptitude.    The  methodology  was  developed  in  an  evolutionary 
manner  from  research  documented  by  Fugill  (1972,  1973).    Chris tal 
(1973)  as  well  as  Maginnis,  Uchima  and  Smith  (1975)  have  further  described 
this  technology.    The  present  paper  will  r escribe  the  development 
of  task  difficulty  benchmark  scales,  the  :U  application,  and  will  include 
a  brief  discussion  of  the  results. 

II.    BASIC  CONCEPTS 

Task  Difficulty 

Task  difficulty  was  operationally  defined  in  terms  of  the  time 
it  takes  to  learn  to  perform  a  task  satisfactorily.    Based  on  Fugill's 
demonstration  (1972)  of  high  relationships  between  task  difficulty 
and  task  aptitude  (r>.89),  this  research  has  been  conducted  under 
the  assumption  that  the  aptitude  level  required  to  learn  a  job  can 
be  inferred  from  task  difficulty,  as  defined  above,  of  the  tasks  that 
make  up  the  job. 

Benchmark  Scales 

A  technique  was  required  that  would  allow  for  the  comparison 
tUe  learning  difficulty  of  tasks  both  within  and  across  Air  Force 
cities.    A  difficulty  scale,  using  one  or  more  tasks  at  each 
;;  )int  as  examplfc^s  of  that  level  of  difficulty,  would  fill  this 
Table  1  presents  a  simple  example  of  such  a  scale.  Task-anchored 
uL'  V-  :y;-imark  scales  were  demonstrated  to  produce  more  reliable  ratings 
of  f.feveral  task  factors  than  did  numerically  anchored  scales  in  a 
study  by  Peters  and  McCormick  (1966).     The  feasibility  of  using  task 
difficulty  benchmark  scales  has  been  demonstrated  by  Fugill  (1972, 
1973). 

Table  1.     Example  Benchmark  Scale 

Level  1  -  Very  Low  Task  Difficulty 
Visually  inspect  batteries 

Level  2  -  Low  Task  Difficulty 
Check  fuse  indication 

Level  3  -  Average  Task  Difficulty 

Adjust  transmissometer  projector  lamp  voltages 

Level  4  -  High  Task  Difficulty 

Trouble-shoot  wind  measuring  sets 

Level  5  -  Very  High  Task  Difficulty 

Trouble-shoot  aircraft  flight  control  circuits 


Aptitude  Areas 


There  are  four  aptitude  areas  in  the  Air  Force  personnel  testing 
system:    general,  administrative,  mechanical  and  electronics.  This 
research  does  not  question  the  appropriateness  of  these  areas;  it 
is  concerned  with  the  relative  order  of  aptitude  area  score  requirements 
for  specialties  and  jobs  within  each  of  those  areas. 

III.    DEVELOPMENT  OF  BENCHMARK  SCALES 

Task  difficulty  benchmark  scales  have  already  been  developed 
for  the  electronic,  mechanical  and  general  aptitude  areas.    The  approach 
was  similar  for  all  scales,  but  the  mechanical  scale  will  be  used 
as  an  example. 

A  general  description  of  the  scale  development  effort  was  presented 
by  Hart  (1977)  at  last  year's  Military  Testing  Association  Conference 
in  San  Antonio.    The  15  specialties  shown  in  Table  2  were  selected 
for  the  mechanical  scale  development.    These  specialties  are  representative 
both  of  the  complexity  and  the  variety  of  tasks  within  the  mechanical 
aptitude  area. 


Table  2.    Mechanical  Specialties 
(N  Task  and  ASVAB  Cut  Off) 


Air  Force  Specialty 

N  Task 

464X0-Explosive  Ordnance  Disposal  Spec. 

551 

4  31X0-Helicopter  Mech. 

577 

5  42X2-Electrical  Power  Production  Spec. 

592 

5  46X0-Liquid  Fuel  Systems  Mech.                  , , 
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427Xl-Corro8ion  Control  Spec. 

457 

361X0-0utside  Wire  and  Antenna  Mech. 

476 

4 23X2-Aircrew  Egress  Systems  Mech. 

376 

4  23X3-Aircraf t  Fuel  Systems  Mech. 

297 

4 26X2- Jet  Engine  Mech. 

415 

5  52X0-Carpenter 

563 

5  52X5-Plumber 

407 

5 66Xl-Environmental  Support  Spec. 

556 

551Xl-Construction  Equip.  Operator 

927 

4  27X3-Fabrication  and  Parachute  Spec. 

553 

5 51X0-Pavements  Maint .  Spec. 

927 

Mech 
ASVAB 
Cut  Off 

60 
50 
50 
50 
50 
40 
40 
40 
40 
40 
40 
40 

40 
40 
40 


1014 


Table  3,    Estimates  of  Interrater  Reliability 


Specialty 

N  (Rater) 

464X0 

Explosive  Ordnance  Disposal  Spec* 

oo 

431X0 

Helicopter  Mecn. 

inn 

97 

542X2 

Electrical  Power  Production  Spec. 

Do 

546X0 

Liquid  Fuel  Systems  Mecn. 

Rl 

4  27X1 

Corrosion  Control  Spec. 

RR 

m  OO 

361X0 

Outside  wire  and  Antenna  Mecn. 

JO 

423X2 

Aircrew  Egress  Systems  Mech. 

J  J 

.88 

423X3 

Aircraft  Fuel  Systems  Mech. 

426X2 

Jet  Engine  Mech. 

O  J 

OA 

5  52X0 

Carpenter 

CO 
OO 

o  o 
•  9j 

552X5 

Plumber 

116 

.97 

5  66X1 

Environmental  Support  Spec. 

56 

.94 

5  51X1 

Construction  Equip.  Operator 

83 

.97 

4  27X3 

Fabrication  and  Parachute  Spec. 

73 

.94 

551X0 

Pavements  Maint.  Spec. 

72 

.97 

Relative  ratings  of  task  difficulty  are  routinely  obtained  in 
conjunction  with  job  inventories  and  occupational  surveys  conducted 
by  the  USAF  Occupational  Measurement  Center,  Lackland  AFB.    These  data, 
obtained  from  incumbent  supervisors,  are  collected  on  all  tasks  in 
the  job  inventories  and  are  provided  to  AFHRL  for  research  purposes. 
Table  3  reflects  the  estimates  of  interrater  reliability  (Lindqulst, 
1953)  and  the  number  of  raters  for  the  15  mechanical  specialties.  Using 
these  data  and  the  criteria  outlined  in  Table  4,  forty  tasks  were  selected 
from  each  specialty  to  establish  a  set  of  600  benchmark  tasks. 

Table  4.    Task  Selection  Criteria 

1.  Eliminate  supervisory  tasks 

2.  Capture  range  of  difficulty 

3.  Select  on  High  Rater  Agreement  (Low  SD) 

4.  Tasks  performed  by  first  termers 

5.  Prefer  well  known  tasks 

6.  Prefer  easily  observed  tasks 

7.  Face  validity 
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lose 


In  preparation  for  selecting  the  tasks  from  the  benchmark  set 
to  represent  the  25  points  on  the  benchmark  scale,  a  panel  of  mechanical 
experts,  provided  by  an  Air  Force  contractor,  was  asked  to  provide 
a  rank-ordering  of  the  600  tasks.    Each  panel  member,  after  accumulating 
detailed  information  on  each  task,  provided  an  independent  rank-order 
of  the  set  of  600  tasks.    The  task  requiring  the  least  learning  time 
was  assigned  number  1  and  the  task  requiring  the  greatest  learning 
time  was  assigned  number  600.    The  estimate  of  interrater  reliability 
was  very  high  (Rkk"*97,  N»8).    This  result  demonstrates  that  a  panel 
of  work  area  experts  can  work  within  our  definition  of  task  difficulty, 
collect  detailed  information  in  the  field  at  the  task  level,  and  provide 
highly  reliable  rank  orderings  of  a  large  number  of  tasks  selected 
from  a  given  specific  work  area. 

To  address  the  matter  of  validity,  the  contractor's  ranking  data 
were  correlated  with  the  field  supervisor's  relative  ratings  referred 
to  earlier.    These  correlations  were  computed  using  mean  ranks  and 
ratings  on  the  forty  tasks  from  each  of  15  specialties  separately; 
results  are  summarized  in  Table  5.    These  coefficients  provide  some 
substantiation  of  the  validity  of  the  data  collection  procedure,  the 
definition  of  learning  difficulty,  and  of  the  data  itself. 


Table  5.    Correlations  between  Mean  Ranks 
and  Mean  Ratings  of  Forty  Tasks 


Specialty 

r 

464X0 

.87 

431X0 

.91 

542X2 

.87 

546X0 

.85 

427X1 

.81 

361X0 

.77 

423X2 

.83 

423X3 

.79 

426X2 

.74 

552X0 

.76 

552X5 

.57 

566X1 

.76 

551X1 

.82 

427X3 

.81 

551X0 

.73 

1016 


Benchmark  Task  Selection 


Two  tasks  were  sele^.ted  to  represent  each  of  the  learning  difficulty 
levels  of  the  25-point  sc::ale.    A  systematic  procedure  was  developed 
to  insure  that  the  selected  tasks  represented  the  distribution  of  the 
mean  ranks  of  the  600  tasks.    In  addition,  the  criteria  summarized 
in  Table  4  were  again  applied  as  appropriate.    Face  validity  was  even 
more  important  in  this  task  selection  process  than  it  was  in  the  prior 
process  in  as  much  as  the  tasks  were  to  be  used  as  examples  that  would 
anchor  the  various  points  on  the  scale.    That  is,  the  tasks  on  the 
mechanical  scale  must  appear  to  be  mechanical  tasks  to  the  extent  possible. 

A  sample  of  the  50  selected  tasks  (two  for  each  of  25  points) 
along  with  mean  and  standard  deviation  from  the  ranking  process  is 
at  Table  6.    The  mean  standard  deviation  for  all  600  tasks  was  62.8. 
Table  6  indicates  the  tjrpe  of  tasks  selected  as  well  as  the  relatively 
high  rater  agreement  for  most  of  them. 


Table  6.    Example  Benchmark  Tasks  -  Mechanical  Scale 


Level 

Task  Title 

X 

SD 

± 

Police  Grounds  for  Litter 

1.50 

.87 

1 

Police  Open  Storage  Areas 

3.50 

1.73 

5 

Clean  Life  Preservers 

26.38' 

13.77 

5 

Dig  Ditches  by  Hand 

27.00 

14.41 

10 

Clean  or  Regap  Spark  Plugs 

136.38 

53.97 

10 

Caulk  Areas  Around  Windows,  Sinks  or 

140.63 

105.52 

Bathtubs 

13 

Install  or  Replace  Water  Fountains 

307.38 

77.31 

13 

Disassemble  or  Clean  Conventional  Fuel 

306.13 

83.64 

Gate  Valves 

15 

Perforin  Preoperational  Inspections  of 

401.63 

88.07 

Engine  after  Engine  has  been  on  long 

S  tandby 

15 

Install  or  Replace  Formica  on  Counter- 

404.13 

74.44 

tops  or  Splashboards 

2Q 

Install  Tail  Roter  Assemblies  on 

562.50 

24.09 

Helicopter  Aircraft 

20 

Read  and  Interpret  Schematic  or  Wiring 

562.00 

58.41 

Diagrams 

25 

Troubleshoot  Installed  Engines 

599.38 

1.32 

25 

Troubleshoot  Systems  for  Breaker  Trip- 

595.38 

5.20 

Outs 
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IV.    PHOCEDURAL  GUIDE 


Accurate  application  of  the  benchmark  scale  requires  detailed 
knowledge  of  both  the  task  bffifng  rated  and  the  reference  tasks  at  each 
level  of  the  scale.    A  procedural  guide  has  been  assembled  for  each 
scale  describing  the  reference  tasks.    This  guide  is  for  the  use  of 
the  panel  of  expert  raters  who  will  actually  apply  the  scales.  There 
are  two  parts:    Part  I  introduces  each  panel  member  to  the  task  of 
assessing  learning  difficulty  and  rating  the  tasks;  Part  II  presents 
the  25-point  scale  and  provides  a  one  page  description  of  each  of  the 
50  tasks  on  the  scale.    This  description  includes  the  level  of  the 
task  on  the  scale,  the  title  of  the  task,  the  specialty  from  which 
it  was  selected,  a  narrative  on  any  specific  equipment  associated  with 
the  task,  a  narrative  describing  the  actual  task  performance,  and  an 
explanation  of  the  skill  and  knowledge  required  to  learn  the  task. 
Examples  of  these  descriptions,  taken  directly  from  the  Mechanical 
Procedural  Guide  (Hart  and  Pulliam,  Note  1),  are  at  Figures  1  and  2. 

Figure  1.    Level  10  Task  Description 

Level  10;    CLEAN  AND  REGAP  SPARK  PLUGS  (Electrical  Power  Production 
Specialist  -  AFSC  54350 

Equipment;    The  task  concerns  gasoline  engines  of  one  or  two  cylinders, 
driving  service  equipment  such  as  air  compressors.    These  engines  are 
part  of  the  support  equipment  in  an  electrical  power  generating  station. 

Task  Description;    The  task  requires  standard  hand  tools  and  an  air 

blast  powered  spark  plug  cleaner  which  blows  an  abrasive  against  the 

plug  base  to  clean  insulator  and  electrodes.    Work  is  performed  in 

the  power  station.    The  mechanic  removes  plugs  from  the  engine,  using 

a  socket  wrench.    He  cleans  the  plug  by  inserting  it  into  a  hole  on 

the  cleaning  machine,  and  pressing  a  valve  to  release  a  blast  of  abrasive 

against  the  plug  base.    After  a  few  seconds  he  removes  the  plug,  inspects 

it  visually  for  clean  ceramic,  and  (on  some  machines)  inserts  it  in 

a  second  hold  for  a  pressure  test.    Defective  plugs  are  thrown  away. 

He  then  checks  the  gap  using  a  gap  gauge  (with  feeler  wires),  and  corrects 

any  error  by  bending  the  outer  electrode  inward,  using  a  slotted  wrench 

which  is  often  part  of  the  gap-gau^  handle.    He  puts  a  new  plug  gasket 

on  the  plug  and  torqties  the  plug  back  in  place. 

Skill/Knowledee  RegnlaHd:    The  task  requires  knowledge  of  standard 
hand  tools,  including  a  torque  wrench.    Since  there  is  likely  to  be 
no  T.O.  for  the  engine  concerned,  the  merhanic  must  know  the  general 
procedure  for  cleaning  and  gapping  a  plug,  and  that  25  foot  Po^^ds 
is  the  usual  plug  torque.    Airmen  who  qualify  for  entry  into  this  field 
usually  have  some  knowledge  of  this  task  before  their  enlistment. 
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Figure  2.    Level  25  Task  Description 


Level  25:    TROUBLESHOOT  INSTALLED  ENGINES  (Jet  Engine  Mechanic  - 
AFSC  42652) 

Equipment;    This  task  is  performed  on  jet  engines  installed  on  aircraft. 
Troubleshooting  includes  isolation  of  failure  within  the  engine  or 
confirming  that  a  failure  is  not  in  the  engine  but  some  related  subsystem. 

Task  Performance;    Troubleshooting  typically  begins  with  a  pilot  write- 
up.    Interpretation  of  these  write-ups  is  often  difficult.    The  isolation 
process  depends  upon  the  failure  sympton  observed.    Oil  leaks,  which 
are  the  most  common  problems  require  that  all  oil  be  cleaned  from  the 
exterior  of  the  engine,  the  engine  and  oil  systems  are  isolated  by 
attaching  vibration  sensors  at  different  locations  around  the  engine 
and  then  running  the  engine  to  look  for  abnormal  vibration  sources. 
Other  problems  such  as  fuel  leaks,  throttle  rigging,  fuel  control, 
and  electrical  problems  require  coordination  with  other  subsystem 
specialties  to  isolate  the  problem  between  the  engine  and  related 
systems. 

Skill /Knowledge  Required;    Learning  troubleshooting  is  accomplished 
by  exposure  and  is  not  formalized.    It  requires; 

(a)  A  complete  knowledge  of  engine  operation  and  its  interface 
with  related  aircraft  subsystems. 

(b)  Ability  to  use  and  understand  the  readings  of  pressure  gauges, 
vibration  sensors,  and  heat  gauges. 

(c)  That  the  mechanic  be  cockpit  qualified  to  enable  him  to  run 
up  the  engine. 

(d)  An  ability  to  read  and  interpret  the  appropriate  Technical 
Orders . 

(e)  Coordination  with  the  efforts  of  other  subsystem  specialists 

to  isolate  problems  in  the  interaction  of  the  engine  and  related  aircraft 
systems . 

It  is  mandatory  for  each  rater  to  fully  absorb  the  contents  of 
the  guide  prior  to  using  the  scale.    Part  I  of  the  guide  calls  for 
a  practice  period  of  actual  study  and  application  prior  to  operational 
use  of  the  scale. 

V.    APPLICATION  OF  BENCHMARK  SCALES 

The  intention  is  to  ultimately  apply  the  scales  to  all  available 
enlisted  specialties  in  the  Air  Force.    Data  collection  .and  analysis 
is  underway..  Because  analysis  is  not  complete,  information  .to  finalize 
the  evaluation  of  the  aptitude  requirements  in  specific  specialties 
is  not  yet  available.    Presented  here  is  a  brief  discussion  on  how 
the  method  is  to  be  applied. 


Typically  60-70  tasks  are  selected  from  each  specialty  to  be 
evaluated.    These  tasks  will  be  selected  using  criteria  similar  to 
those  used  In  selection  of  the  benchmark  set.    The  tasks  will  be  Indi- 
vidually studied  In  depth  at  both  the  technical  school  and  at  two  or 
more  operational  work  sites.    A  typical  panel  will  be  made  up  of  12 
members  with  two  teams  of  six  visiting  separate  locations.  After 
accumulating  as  much  data  as  feasible  on  each  task,  the  panel  members 
will  Independently  provide  1-25  point  ratings  of  learning  difficulty 
for  all  60-70  tasks  in  each  specialty.    These  ratings  (for  a  sample 
of  tasks  within  each  specialty)  can  be  used  to  estimate  the  learning 
difficulty  of  all  tasks  in  a  specialty  using  traditional  statistical 
procedures  for  estimations. 

VI.    DATA  ANALYSIS 

The  Comprehensive  Occupational  Data  Analysis  Program  (CODAP)  package 
developed  by  AFHRL  is  the  data  analytic  tool  being  used  in  the  analysis 
of  these  data.    The  CODAP  system  is  ideally  suited  for  this  job.  Programs 
are  readily  available  to  provide  all  necessary  analysis  for  the  project. 

The  contractor's  benchmark  ratings  and  the  supervisor's  relative 
ratings  of  the  same  60  tasks  are  input  to  a  two  variable  multiple  regression 
problem  for  each  specialty.    The  resulting  equation  is  then  to  be  applied 
to  the  supervisor's  relative  ratings  of  all  tasks  in  the  specialty. 
This  process  will  result  in  the  prediction  of  a  1-25  point  rating  mean 
for  each  task  in  the  specialty.    These  predicted  difficulty  levels 
are,  in  turn,  used  as  input  to  the  CODAP  system  for  the  computation 
of  average  task  difficulty  for  a  variety  of  groups,  and  job  types  within 
each  occupation.    For  example,  the  average  task  difficulty  for  first 
term  airmen  will  be  computed  for  each  specialty  and  will  be  comparable 
across  all  specialties  in  an  aptitude  area.    Similar  computations  will 
be  made  on  other  combinations  of  tasks  and/or  job  incumbents. 

VII.     PRELIMINARY  RESULTS 

The  analysis  completed  to  date  has  resulted  in  demonstration  of 
the  efficacy  of  the  method.    Interrater  agreement  estimates  with  12 
raters  rating  60  tasks  from  each  specialty  have  ranged  from  .88  to 
98      These  results  have  convinced  us  that  the  scale,  in  hand  with 
the 'procedural  guide,  can  be  reliably  applied  by  knowledgeable  work 
experts . 

Some  preliminary  correlational  analysis  has  been  completed  with 
positive  results.    Correlations  between  the  two  teams  of  raters  have 
ranged  from  .82  to  .94.    Correlations  between  the  ratings  of  relative 
difficulty  and  the  benchmark  ratings  are  ranging  from  .71  to  .94.  Both 
of  these  results  are  indicative  of  the  validity  of  our  methodology. 
Further  data  collection  and  analysis  will  be  much  more  conclusive. 
An  illustration  of  the  planned  format  of  the  data  is  provided  in  Figure  3. 


In 
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"^icalLy  60-70  tasks  are  selected  from  each  specialty  to  be 
evaluj4:ed.    These  tasks  will  be  selected  using  criteria  similar  to 
thar  »aged  In  selection  of  the  benchmark  set.    The  tasks  will  be  indi- 
vidwlly  studied  in  depth  at  both  the  technical  school  and  at  two 
or  wmoB  operational  work  sites.    A  typical  panel  will  be  maif**  up  of 
12  :nwters  with  two  teams  of  six  visiting  separate  locations  Af'ter 
acrmrnrlating  as  much  data  as  feasible  on  each  task,  the  pane  ers 
will  independently  provide  1-25  point  ratings  of  learning  d  :lty 
for  all  60-70  tasks  in  each  specialty.    These  ratings  (for  le 
of  tasks  within  each  specialty)  can  be  used  to  estimate  the  ling 
difficulty  of  all  tasks  in  a  specialty  using  traditional  st-  ical 
procedures  for  estimations. 

VI.    DATA  ANALYSIS 

The  Comprehensive  Occupational  Data  Analysis  Program  (CODAP) 
package  developed  by  AFHRL  is  the  data  analytic  tool  being  used  in 
the  analysis  of  these  data.    The  CODAP  system  is  ideally  suited  for 
this  job.    Programs  are  readily  available  to  provide  all  necessary 
analysis  for  the  project. 

The  contractor's  benchmark  ratings  and  the  supervisor's  relative 
ratings  of  the  same  60  tasks  are  input  to  a  two  variable  multiple 
regfeesion  problem  for  each  specialty.    The  resulting  equation  is 
ti^^  to  he  applied  to  the  supervisor's  relative  ratings  of  all  tasks 
in  die  specialty.    This  process  will  result  in  the  prediction  of  a 
1-35  point  rating  mean  for  each  task  in  the  specialty.    These  predicted 
dl^tculty  levels  are,  In  turn,  used  as  input  to  the  COMP  system 
fair  the  computation  of  average  task  difficulty  for  a  variety  of  groups, 
ani  job  types  within  each  occupation.    For  example,  the  average  task 
difflLculty  for  first  term  airmen  will  be  computed  for  each  specialty 
and  will  be  comparable  across  all  specialties  in  an  aptitude  area. 
Similar  computations  will  be  made  on  other  combinations  of  tasks  and/or 
job  incumbents. 

VII.    PRELIMINARY  RESULTS 

The  analysis  completed  to  date  has  resulted  in  demonstration 
of  the  efiElcacy  of  the  method.    Interrater  agreement  estimates  with 
12  ratera  rating  60  tasks  from  each  specialty  have  ranged  from  .88 
.    to  .98.    These  results  have  convinced  us  that  the  scale,  in  hand  with 
the  procedural  guide,  can  be  reliably  applied  by  knowledgeable  work 
experts. 

Some  preliminary  correlational  analysis  has  been  completed  with 
positive  results.    Correlations  between  the  two  teams  of  raters  have 
ranged  from  .82  to  .94.    Correlations  between  the  ratings  of  relative 
difficulty  and  the  benchmark  ratings  are  ranging  from  .71  to  .94. 
Both  of  these  results  are  indicative  of  the  validity  of  our  methodology. 
Eurthendata  :col3^ct±an -^and  analysis  will  be  much  more:  conclusive.  |£ 
■^jgj^flTt^^  format  of  the  dataAs  ycoii^^ 
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Figure  3.  Relative  Aptitude  Requirements  for  First  Term  Jobs 
In  8  Specialties  (Hypothetical  Data) 


Relative  Difficulty  (Bar  =  +  ISD) 


CURRENT       AFS    200      230      260      290      320      350      380      AlO  RECOM.  ■ 

ASVAB  MIN,  215       245      275      305      335      365      395      425  ASVAB  MM 


60          A   1   60 

40           B   1   60 

60           C   1   50 

60          D   1   50 

50           E   1   50 

60           F   1   40 

40   1   40 

40  .  H — i   40 

'J .  0 

ERIC 


A  brief  comparison  of  the  column  containing  current  ASVAB  minimum  with 
the  column  reflecting  reconmended  ASVAB  minimum  Indicates  that  there 
Is  evidence  of  misalignment  of  the  aptitude  requirements  of  these  eight 
specialties*    Specifically,  F±gure  3  Indicates  that  some  specialties 
may  have  a  high  current  mlnimam  aptitude  requirement  but  may  actually 
>  have  a  much  lower  required  minimum  (e.g.,  specialties       D  and  F) . 
The  opposite  is  true  for  specialty  B.    Other  specialties  will  be  found 
to  cover  an  extremely  wide  range  of  jobs  (indicated  by  the  length  of 
the  horizontal  lines  on  Figure  3)  suggesting  that  the  specialty  Itself 
might  be  shredded  out  in  some  fashion  •    The  information  contained 
in  Figure  3  is  not  based  on  actual  data;  but  data  of  this  type  will 
soon  be  available  on  approximately  200  specialties.    Changes  in  aptitude 
requirements  require  a  total  systems  approach,  and  we  do  not  Intend 
to  release  any  data  in  a  piece-meal  fashion. 

VIII.     CONCLUSIONS  AND  FUTURE  PLANS 

The  analysis  of  data  to  date  indicates  that  we  have  developed 
a  methodology  which  will  enable  us  to  evaluate  aptitude  requirements 
at  the  task,  job,  and  occupation  level.    The  benchmark  scale  approach 
results  in  the  collection  of  difficulty  data  at  the  task  level  that 
is  comparable  across  all  tasks  within  an  aptitude  area  regardless  of 
specialty.    The  results  of  the  data  analysis  to  date  are  sufficient 
to  conclude  that  the  total  technology  is  based  on  a  sound  approach 
and  analysis  methodology. 

There  are  studies  in  process  that  address  the  matter  of  longevity 
of  the  data;  that  is,  how  long  will  these  data  reflect  the  requirements 
of  the  specialty.    Preliminary  results  indicate  that  the  contr^^ctor 
benchmark  data  may  be  useful  in  assessing  the  learning  difficulty  of 
the  specialty  for  several  years.    The  difficulty  scale  is  anchored 
with  tasks  that  should  not  easily  become  obsolete  because  of  the  task 
selection  process.    First,  to  the  extent  possible,  tasks  were  selected 
that  were  well  known  to  mechanical  workers;  and  second,  extreme  care 
was  used  in  documenting  each  task  in  the  procedural  gulcies*  Primarily 
for  these  reasons,  it  is  not  necessary  for  the  tasks  on  the  scales 
to  even  remain  in  active  occupational  task  inventories  to  be  effective* 
The  scale  will  remain  an  effective  tool  as  long  as  experts  in  the  work 
area  can  comprehend  the  terminology  qsexL  and  the  written  dacumentatlon 
provided  in  the  procedural  guide.    Not  mrly  will  the  scaike  iand  the 
benchmark  data  be  useful  in  years  to  come,  but  the  scales  as  they  are 
will  also  be  useful  in  examining  the  difficulty  level  of  future  tasks 
as  they  are  added  to  job  inventories.    This  procedure  will  allow  the 
evaluation  of  the  aptitude  requirements  of  new  specialties  and/or  tasks 
as  they  becomie  a  part  of  Air  Force  work. 

Implementation  of  the  results  of  this  project  is  anticipated  in 
FY  80  or  81,    The  primary  procedure  for  implementation  is  to  change 
the  aptitude  mlnlmums  as  listed  in  AF  Regulation  39-1.    The  results 
will  also  be  Implemented  through  the  computerized  job-offer  system 
used  by  the  AF  Recruiting  Service.    Plans  for  this  form  of  implementation 
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are  currently  being  prepared.    We  also  plan  to  develop  a  total  imple- 
mentation package  that  will  include  complete  impact  analyses  with  recom- 
mendations for  coordinated  changes  in  the  length  and  difficulty  of 
Air  Force  resident  school  training  courses. 

There  are  three  significant  areas  where  cost  avoidance  should 
be  achieved  as  a  result  of  this  research.    Contingency  plans  for  talent 
shortages  will  be  available  as  a  product  of  this  effort.    These  plans 
will  enable  the  Air  Force  to  specifically  plan  for  talent  shortages 
in  any  specific  specialty  or  across  all  specialties.    Another  product 
will  be  a  more  defensible  position  for  aptitude  requirements  in  the 
case  of  court  actions.    The  present  system,  which  excludes  many  indi- 
viduals from  entering  Air  Force  jobs  based  on  a  "cut-off  aptitude  score, 
has  no  objective  data  to  support  its  use.    This  research  will  provide 
data  on  the  learning  load  requirements  for  each  job.    Another  product 
will  be  an  improved  match-up  of  Air  Force  talent  and  job  requirements. 
Improving  this  match  of  talent  with  requirements  can  have  effects  on 
job  attitude,  retention,  recruiting,  and  training,  to  name  just  a  few. 


In 
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OBJECTIVE  EVALUATION  OF  CORRESPONDENCE  COURSE  ITEMS 


Andrew  N.  Dow,  Ed.D. 

Naval  Education  and  Training  Program  Development  Center 
Pensacola,  Florida  32509 

Every  word  that  is  spoken  or  written  is  evaluated  to  some  extent 
by  someone.    Those  of  us  who  prepsire  training  materials  feel  more  at 
ease  when  we  get  evaluative  feedback  from  the  performance  of  our 
materials.    It  is  in  reply  to  such  quests  for  feedback  that  this 
paper  has  been  prepared.    However,  before  we  go  further,  we  must 
define  and  describe  our  subject  matter. 

In  this  presentation,  "correspondence  course"  refers  to  the 
series  of  interrogatories  that  accompanies  the  text.    The  individual 
interrogatories  are  the  items  of  the  correspondence  course.  This 
material  converts  a  book,  or  other  text  materials,  into  a  self- 
teaching  course. 

Most  of  the  items  which  comprise  the  correspondence  course,  bear 
a  strong  resemblance  to  multiple -choice  test  items;  some  ask  a  ques- 
tion that  is  followed  by  several  possible  answers,  while  others  con- 
tain a  stem  that  is  an  incomplete  statement  followed  by  several 
possible  completions.    In  spite  of  the  superficial  resemblance  to 
the  typical  objective  test  item,  the  primary  purpose  of  the  corres- 
pondence course  item  is  instruction.    Evaluation,  which  is  the 
primary  purpose  of  the  test  item,  becomes  the  secondary  purpose  of 
the  course  item.    Conversely,  instruction,  which  is  the  primary 
purpose  of  the  course  item,  is  the  secondary  purpose  of  the  test 
item.    It  must  also  be  recognized  that  some  course  items  are  more 
evaluative  than  others,  while  some  are  almost  pure  teaching  items, 
too  easy  to  have  any  evaluative  function. 

Regardless  of  their  function,  course  items  need  to  be  evaluated. 
An  item  that  is  unrelated  to  the  course  and  its  learning  objectives 
is  a  waste  of  paper  and  the  time  of  the  student.    Further,  that  item 
may  be  occupying  the  space  of  an  as  yet  unwritten  effective  item. 
Like  any  other  training  material,  the  items  of  a  correspondence 
course  can  be  evaluated  when  they  are  reviewed  by  knowledgeable 
persons.    Optimally,  the  person  who  originally  prepares  the  items 
takes  a  second  look  some  time  after  preparing  them,  and  a  co-worker 
also  gives  them  a  critical  review.    This  constitutes  internal  review. 
External  review  consists  of  critical  evaluation  of  one  or  more  items 
by  an  unlnvolved  person. 

Evaluation  by  review  has  a  number  of  shortcomings.    The  most 
obvious  is  the  amount  of  manpower  required  to  do  a  good  job.    A  single 
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review  is  time  consuming;  multiple  reviews  are  more  so.    Since  there 
is  a  good  chance  that  any  review  will  be  biased,  several  reviews  may 
overcome  the  bias  if  the  reviewers  hold  relatively  diverse  viewpoints. 
When  there  are  several  reviewers  of  diverse  viewpoints,  in  addition 
to  those  that  are  internal  and  involved,  there  are  those  that  are 
external  and  impersonal.    The  internal  reviewer  who  has  been  involved 
in  the  development  of  the  materials  brings  a  sophistication  that  is 
as  necessary  as  the  impersonality  of  the  noninvolved  external  re- 
viewer.   With  a  diverse  group  of  reviewers,  there  may  be  little 
agreement;  someone  still  must  decide  which  criticisms  to  accept  or 
reject  and  must  synthesize  their  aspects.    All  of  this  increases  the 
manpower  demands  of  such  a  process,  and,  even  with  intersubjective 
agreement,  doubt  still  remains  as  to  its  objectivity  and  validity. 

A  system  of  item  evaluation  that  requires  somewhat  less  manpower 
is  based  upon  the  surface  resemblance  of  the  course  item  to  the 
typical  objective  examination  item.    This  system  employs  item  re- 
sponse counts  as  used  in  test  item  analysis  (2)  .    These  counts  aire 
objective  and  reliable,  requiring  very  little  manpower,  but  there 
seems  to  be  no  consensus  as  to  the  meaning  of  the  counts,  nor  how 
they  can  be  used  to  improve  the  courses.    Thus,  it  is  appropriate 
that  we  look  at  some  of  the  possible  causes  of  some  of  the  several 
levels  of  correct  response  counts. 

Some  items  will  be  answered  correctly  by  almost  everyone,  e.g., 
giveaways— "When  was  the  War  of  1812  fought?"— which  one  can  answer 
without  having  taken  the  course  and  without  any  great  fund  of  general 
knowledge.    This  is  the  worst  kind  of  high  percentage  correct  item. 
The  best  kind  of  high  percentage  correct  item  is  one  that  is  well 
covered  in  the  text;  the  text  materials  are  comprehensive  and  compre- 
hensible, and  the  course  item  is  not  ambiguous.    These  two  kinds  of 
easy  items  are  the  extremes.    Other  items  will  be  answered  correctly 
by  a  high  percentage  of  the  respondents  because  the  items  are  based 
on  information  available  to  most  of  them— sometimes  called  common 
knowledge.    If  an  item  such  as  this  has  some  bearing  on  the  rest  of 
the  course,  there  may  be  justification  for  keeping  it.    If  a  common 
knowledge  item  is  related  only  vaguely  to  the  course  subject  matter, 
there  is  no  reason  to  retain  it.    Sometimes,  high  percentages  correct 
are  the  result  of  compromise — the  word'  has  gotten  out  on  some  of 
the  items.    This  is  particularly  likely  to  occur  with  a  popular 
course.    If  compromise  is  relatively  universal,  then  a  rewrite  is  in 
order;  the  basic  material  can  be  covered  from  a  different  viewpoint. 
Thus,  we  have  four  of  the  possible  reasons  for  a  high  percentage  of 
correct  answers,  and  only  one  of  them  is  really  desirable  from  a 
pedagogical  point  of  view. 

A  large  number  of  items  will  be  answered  correctly  by  a 
moderately  high  percentage  (60^-80^)  of  those  taking  the  course.  An 
obvious  reason  for  this  in  some  instances  is  that  the  text  covers 
the  material,  but  not  as  well  as  it  does  for  the  good  items  that  are 
correctly  answered  by  a  higher  percentage.    Some  others  will  fall 
into  this  groups  because  the  item  is  not  well  phrased;  the  text  is 
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not  at  fault,  the  item  is.    A  third  kind  of  item  joins  these  others 
just  because  its  subject  matter  does  not  stimulate  thought  and/or 
learning}  both  the  text  and  the  item  are  well  worded  and  course 
related,  but  the  material  just  is  not  remembered  well.  Ofttimes 
this  kind  of  material  is  part  of  a  series  of  building  blocks  and  is 
essential  to  the  covirse.    Some  items  will  fall  into  the  moderately- 
high  percentage  correct  group  because  they  are  based  upon  general 
knowledge  that  is  not  universal  knowledge.    Others  will  wind  up 
here  because  they  are  relatively  difficult  but  have  been  compromised 
•  to  a  limited  degree. 

Then,  there  are  those  items  that  are  correctly  answered  by  only 
a  very  small  percentage  of  the  respondents.    Some  of  these  are  in 
this  group  because  of  exaggerations  of  the  conditions  that  produced 
the  items  for  the  moderately  high  percentage  correct  group.  In 
addition,  some  of  the  items  are  not  answered  by  manv  respondents 
because  the  text  does  not  cover  the  material  well  enough  for  many 
to  get  the  item  correct.    Some  other  items  fall  into  this  statistical 
group  because  the  item  is  worded  ambiguously,  and  most  of  : 
respondents  choose  a  wrong  interpretation.    Also,  there  aire  some 
items  that  very  few  answer  correctly  because  the  item  structure  is 
such  that  they  become  high-level  ability  items,  even  though  all  the 
material  needed  to  answer  them  can  be  found  scattered  about  the  text. 

We  have  just  looked  at  some  of  the  reasons  that  coiirse  items 
are  answered  by  a  certain  percentage  of  those  who  take  the  coirrse. 
This  is  not  an  exhaustive  list  of  the  reasons  behind  item  behavior, 
but  it  is  a  start.    Obviously,  the  item  count  percentages  are  not 
diagnostic.    Without  careful  analysis  (subjective)  there  is  no  way 
to  tell  whether  an  item  is  adequate  as  it  stands,  needs  some 
revision,  or  shoiild  be  thrown  out. 

Some  of  the  dilemmas  raised  by  the  simple  item  analysis  type 
response  count  can  be  resolved  by  using  quasi-experimental  de- 
signs (l)  which  incorporate  several  item  counts.    We  will  first 
describe  three  possible  designs  and  then  discuss  the  probable  out- 
comes of  using  each  of  them.    Each  of  these  designs  involves  a 
process  analogous  to  pretesting;  some  students  will  go  through  the'> 
interrogatory  items  of  the  course,  answering  each  before  studying 
the; text  materials.    These  same  students  will  take  the  course  after 
being  allowed  to  study;  other  groups  will  take  the  course  under 
varying  sets  of  conditions.    These  are  described  in  the  following 
paragraphs . 

The  first  procedure  (QE  l)  is  comparable  to  a  simple  "Test- 
Hetest"  design.    All  persons  who  participate  will  take  the  covirse, 
sinswering  the  items  without  having  access  to  the  text  materials. 
While  the  coiirse  is  not,  strictly  speaking,  a  test,  this  partici- 
pation without  stiidy  will  be  termed  a  "pretest."    Then,  these  same 
participants  will  study  the  coiirse  text  and  answer  the  course  items. 
This  will  be  known  as  the  "post  test."    The  response  counts  from 


these  two  uses  of  the  course  items  yields  three  P  values* (percentage 
of  respondents  answering  correctly)  for  each  item — a  pretest  P  value 
to  "be  called  Pre  P;  a  post  test  P  value  to  be  called  Post  P;  and  a 
differential  P  value  derived  by  subtracting  the  Pre  P  for  an  item 
from  its  Post  P.    The  differential  P  value  will  be  designated  Dif  P. 
The  Pre  P  gives  an  indication  of  how  much  the  item  depends  upon 
common  or  precourse  knowledge;  a  Pre  P  of  50  indicates  that  half  of 
the  participants  were  able  to  select  the  correct  response  without 
benefit  of  the  course  text.    The  Post  P  of  an  item  is  an  indication 
of  the  general  difficulty  of  the  item,  but  does  not  show  whether  the 
item  was  answered  from  general,  precourse  knowledge  or  from  course 
derived  information.    Dif  P  irxiicates  how  well  the  item  is  related 
to  the  text  of  the  course;  the  larger  the  Dif  P  (in  relation  to  the 
Pre  p),  the  more  the  item  depends  upon  the  text  material.  Compro- 
mised items  and  items  that  can  be  answered  from  general  knowledge 
will  have  a  rather  low  Dif  P. 

The  second  procedure  (QE  2)  is  designated  "Test-Retest  with  Post 
Control.*'    In  addition  to  participants  as  used  in  the  first  pro- 
cedure, this  procedure  calls  for  a  control  group.    These  two  groups 
(group  X,  the  experimental  participants,  and  group  C,  the  control) 
should  be  selected  or  matched  by  one  of  the  systems  recommended  by 
Campbell  and  Stanley  (l).    Group  X  is  handled  just  like  the  partici- 
pants in  the  first  procedure,  and  the  data  derived  are  of  the  same 
type.    Group  C  takes  the  course  in  the  regular  fashion  without  a 
pretest;  the  item  counts  from  Group  C  should  be  representative  of 
the  usual  course  takers.    The  item  count  from  Group  C  is  designated 
Post  Pcf  and  that  from  the  post  test  data  from  group  X  as  Post  Px« 
Thus  Pc  and  Px  can  be  compared  for  each  course  item. 

A  third  procedure  (QE  3)  for  the  evaluation  of  course  items  is 
called  the  ''Test-Retest  with  Dual  Control."    This  procedure  calls 
for  groups  X  and  C  as  in  the  Test-Retest  with  post  control  and,  in 
addition,  a  second  control  group  called  group  CC.    Groups  X  and  C 
participate  the  same  way  as  in  the  Test-Retest  with  Post  Control. 
Group  CC  takes  th'    items  twice  with  group  X,  but  does  NOT  have  access 
to  the  text  for  the  second  taking  of  the  items.    The  additional  data 
yielded  by  this  procedure  are  labeled  Pre  Pcci  Post  Pcci  and  Dif  Pqc* 

Table  1  summarizes  the  three  procedures  and  compares  them  with 
taking  a  course  in  normal  fashion. 

Each  of  the  three  procedures  entails  more  work  than  a  simple 
count  of  the  responses  of  a  group  of  course-taJcing  students.  What 
benefits  are  derived  from  each  of  these  procedures?    What  are  the 
limitations  of  the  three?    We  shall  attempt  to  answer  these  ques- 
tions by  examining  each  of  the  three. 
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Table  1 


Compaxison  of  Normal  Procedure  of  Course 
Taking  and  Three  Experimental  Designs 
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Before  examining  the  three  quasi-experimental  designs,  we  should 
look  at  the  simple  count  of  responses.    The  simple  count  indicates 
the  percentage  that  responds  successfully  to  each  item  after  having 
had  access  to  the  text  materials.    This  count  does  not  indicate  how 
much  of  the  success  of  the  respondents  can  be  attributed  to  their 
exposure  to  the  text,  how  much  to  knowledge  that  they  had  before  they 
started  the  course,  and  how  much  to  incidental  learning  that  occurred 
conciarrently  with  taking  the  course.    We  have  previously  pointed  out 
some  of  the  reasons  for  an  item's  being  answered  by  a  given  percentage. 

The  first  of  the  procedures  (QE  l)  the  simple  Test-Re test,  yields 
three  kinds  of  data:    Pre  P,  Post  P,  and  Dif  P.    The  Pre  P  values  give 
a  good  indication  of  the  extent  to  which  the  items  depend  upon  gen- 
eral knowledge  that  the  students  had  before  they  started  the  course. 
Ideally,  these  values  are  low,  about  25  or  less.    The  Post  P  indi- 
cates the  general  level  of  achievement  after  the  course  is  completed 
by  students  that  have  been  primed  by  the  pretesting.    Dif  P  is  an 
indication  of  how  much  the  students  improved  during  the  period  that 
they  were  Involved  with  the  course  proper.    Remember,  this  improve- 
ment can  be  the  product  of  experiences  other  than  exposure  to  the 
text  and  participation  in  the  cotarse  items « 

The  second  procedure  (QE  2),  Test-Retest  with  Post  Control, 
yields,  in  addition  to  the  data  of  the  types  yielded  by  the  simple 
Test-Retest,  two  sets  of  Post  P  values.    The  post  test  P  values 
(Post  Pq)  are  obtained  from  the  responses  of  students  who  did  not 
take  a  piretest.    Therefore,  they  are  free  of  any  priming  influence 
which  may  result  from  taking  the  course  items  before  being  exposed 
to  the  text.    These  data  are  also  free  of  any  practice  effect  score 
enhancement,  so  they  should  be  representative  of  the  data  from 
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typical  student  groups.    A  comparison  of  Post  Px  will  show  the  com- 
tined  effects  of  practice  and  priming. 

The  third  procedure  (QE  3),  Test-Retest  with  Dual  Controls,  in 
addition  to  the  data  of  the  types  yielded  ty  the  second  procedure, 
produces  a  set  of  P  values  from  the  first,  or  pre,  administration,  a 
set  of  P  values  from  the  second,  or  pseudo  post,  participation  in 
the  course  items,  and  a  set  of  differential  P  values.    These  will  be 
designated  Pre  Pcci  Post  Pcci  and  Dif  Pcci  respectively.    Pre  Pec 
will  "be  useful  in  evaluating  the  comparability  of  group  X  and  group 
CC;  Dif  Pec  can  be  used  to  establish  how  much  of  Dif  Px  is  the 
result  of  both  practice  and  incidental  learning,  without  the  text 
information.    Post  Px  -  Post  Pcci  will  give  an  indication  of  the 
size  of  the  performance  increment  that  results  from  exposure  to  the 
text  materials.    Dif  Pec  will  indicate  approximately  how  much  of  the 
score  is  due  to  practice  effect  and  incidental,  sifter  priming, 
learning. 

Table  2  summarizes  the  three  procedures  and  the  kinds  of  data 
available  from  them.    There  are  also  comments  relating  to  the  data. 

We  also  realize  that  in  some  courses  and  situations  some  of 
these  procedures  are  not  practical.    Many  of  the  response  forms,  or 
answer  sheets,  used  with  correspondence  courses  are  not  readily 
adapted  to  automated  response  counting.    Data  from  such  forms  can  be 
hand  counted  or  key  punched  for  machine  counting. 

As  the  various  P  values  can  have  various  causes,  there  is  no 
way  that  a  computer  can  read  the  P  values  and  accept  or  reject  the 
items.    A  trained  eye  will  always  be  needed  to  look  at  the  several 
P  values  for  each  item  and  then  at  that  item;  afterwards,  decisions 
can  be  made .    An  item  with  a  high  Pre  P  that  has  an  instructional 
function  should  be  retained.    There  also  may  be  a  reason  for  keeping 
an  item  with  a  very  low  Post  P. 

At  this  point,  there  are  many  unanswered  questions.    A  paper 
such  as  this  tries  to  open  new  avenues  rather  than  supply  pat 
answers . 
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Table  2 


Data  Acquired  from  Normal  Procedure  of  Course 
Taking  and  from  Three  Experimental  Designs 


Procedure  Data 


Comments 


Normal  P 
1  Pre  P 

Post  P 

Dif  P 

QE  2  Pre  Px 

Post  Px 
Dif  Px 
Post  Pc 

Post  Px  -  Post  Pc 

QE  3 

Post  Px 
Dif  Px 
Post  Pc 
Pre  Pec 
Post  Pec 


Post  Px  -  Post  Pc 

Pre  Px  -  Pre  Pec 

Post  Px  -  Post  Pec 
Dif  Px  -  Eif  Pec 

Post  Pc  -  Post  Pec 


%  students  responding  correctly 

%  students  without  training  who 
respond  correctly,  related  to  non- 
course  knowledge 

%  trained  students  responding  cor- 
rectly 

Increase  in  correct  responses  after 
training 

Same  as  Pre  P,  QE  1 

Same  as  Post  P ,  QE  1 

Same  as  Dif  P,  QE  1 

Same  as  P,  Normal  procedure 

Increase  in  correct  responses  that 

result  from  pretest  priming 

Same  as  Pre  P,  QE  1 

Same  as  Post  P,  QE  1 

Same  as  Dif  P,  QE  1 

Same  as  P,  Normal  procedure 

Same  as  Pre  P,  QE  1 

%  correct  responses  that  result  from 

training  and  priming  that  comes  from 

taking  the  pretest  without  exposure 

to  the  text  materials 

Same  as  in  QE  2 

Checks  quality  of  groups  X  and  CC 

These  show  the  effect  that  the  text 
has  upon  making  correct  responses 

Of  little  use  in  evaluating  items 
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THE  EMERGENCE  OF  AN  ITEM-WRITING  TECHNOLOGY 


Gale  Roid  and  Tom  Haladyna 
Teaching  Research  Division 
Oregon  State  System  of  Higher  Education 
Monmouth,  Oregon  97361 


Abstract 


This  paper  provides  a  review  of  the  emerging  technology  of  test- 
Item  writing  for  criterion-referenced  tests.    Several  different 
approaches  to  item  development  are  discussed.    A  continuum  of  item- 
writing  methods  is  proposed  ranging  from  informal-subjective  methods  to 
computerized-objective  methods.    Examples  of  techniques  include 
objective-based  item  writing,  amplified  objectives,  item  forms,  facet 
design,  dome in- referenced  concept  testing  and  computerized  techniques. 
Data  from  studies  of  item-writing  techniques  are  also  reviewed.  Recom- 
mendations for  further  research  and  for  applications  to  criterion- 
referenced  testing  are  presented. 


Revised  version  of  a  paper  presented  at  the  annual  meeting  of  the  Mflita 
Testing  Association,  Oklahoma  City,  October  1978. 
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THE  EMERGENCE  OF  AN  ITEM-WRITING  TECHNOLOGY 


Gale  Roid  and  Tom  Haladyna 
Teaching  Research  Division 
Oregon  State  System  of  Higher  Education 
Monmouth,  Oregon  97361 


Developers  of  any  criterion-referenced  (CR)  achievement  test  have 
been  confronted  with  the  problem  of  writing  test  items  which  closely 
reflect  the  intent  of  instructipn.    This  problem  was  earlier  recognized 
by  Osburn  (1968)  and  by  Hively  and  his  colleagues  (Hively,  Patterson  & 
Page,  1968).    Bormuth  (1970)  was  among  the  first  to  formally  propose  a 
science  of  item  writing  as  a  replacement  for  the  informal,  subjective 
experiences  that  commonly  form  the  basis  for  item  writing.     In  a  review 
of  Bormuth's  approach-'to  item  writing,  Cronbach  (1970)  remarked: 

The  design  and  construction  of  achievement  test  items 
has  been  given  almost  no  scholarly  attention.    The  leading 
works  of  the  generation--even  the  Lindquist  Educational 
Measurement  and  the  Bloom  Taxonomy —  are  distillations  of 
experience  niore  than  scholarly  analyses,     (p-  509) 

Since  that  time,  there  has  been  increasing  activity  in  the  area  of  item 
writing  which  clearly  indicates  the  emergence  of  an  item-writing  tech- 
nology that  is  grounded  in  theory  and  is  now  developing  a  research  base. 

The  objective  of  this  review  is  to  describe  the  progress  in  the 
technology  of  item  writing.    This  review  should  serve  to  stimulate 
theoretical  and  empirical  work  in  the  further  advancement  of  item-writing 
technology,  as  well  as  provide  useful  guidelines  to  instructors  and 
researchers  who  are  interested  in  producing  instructional ly  relevant 
achievement  tests. 

It  is  important,  however,  to  provide  an  appropriate  background  for 
this  review.    Therefore,  the  steps  one  might  employ  in  the  construction 
of  appropriate  achievement  tests  are  presented,  and  the  role  that  tests 
play  in  systematic  instruction  is  briefly  described.    Two  distinct 
approaches  to  test-item  writing  are  presented  and  contrasted.  Studies 
are  reviewed  which  bear  on  the  feasibility  and  utility  of  each  approach 
In  producing  effective  CR  tests.     Finally,  recommendations  are  offered 
for  future  research  and  development. 


Developing  CR  Tests 

Five  steps  are  identified  that  are  essential  aspects  of  achievement 
test  development  (illustrated  in  Figure  1).    These  steps  reflect  a 
process  which  ideally  occurs  in  any  test  development.    The  first  step 
is  the  conceptualization  of  the  content  to  be  learned.     Initially,  the 
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Steps 

.  conceptualization  of 

instructional  intent 


development  of 
instructional  objectives 


OR 


doma  i  n 
speci  f ication 


i  tern  development 


test  construction 

Figure  1.    Steps  in  the  development  of  a  CR  test. 


instructional  developer  or  instructor  must  identify  what  the  student  must 
learn  as  a  consequence  of  instruction.    This  step  may  be  based  on  a  task 
analysis  or  job  analysis,  or  it  may  be  admittedly  introspective.     It  may 
be  a  ''private  event"  that  is  purely  abstract,  but  it  Is  a  vital  beginning 
In  the  process  of  planning  instruction  and  the  associated  CR  testing 
that  is  part  of  this  instruction. 

The  process  of  defining  content  has  been  the  subject  of  much  research. 
As  Shavelson  (Note  l)  points  out,  there  are  at  least  three  distinct  ways 
in  which  to  describe  content  structure.    The  first  is  hierarchically  in 
the  manner  suggested  by  Gagne  (1962),  Ausubel  (I963)  or  Bruner  (I966).  A 
second  approach  is  content  analysis  whereby  a  system  is  used  to  categorize 
content.    A  third  approach  involves  the  defining  of  concepts  and  their 
relationships.    However,  the  state  of  the  science  here  appears  to  be  more 
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in  the  direction  of  informal,  non-theoretical  approaches  to  defining  con- 
tent, rather  than  the  theoretical  positions  described  by  Shavelson. 

Following  the  conceptualization  of  what  is  to  be  taught,  in  step  two 
Instrjctional  intent  is  then  transformed  into  either  (a)  instructional 
objectives  which  represent  the  behaviors  to  be  elicited  by  learners  as  a 
result  of  instruction  or  (b)  the  specification  of  the  content  domain  to 
be  learned.    Objectives,  for  quite  some  time,  have  been  the  mainstay  of 
CR  testing,  although  recent  statements  like  those  of  Popham's  (1975)  and 
Millman's  (I97^a)  have  indicated  that  the  inherent  weakness  in  using 
objectives  is  that  they  permit  considerable  freedom  when  creating  items, 
and  studies  like  Roid  and  Haladyna  (1978)  offer  empirical  support  to 
this  view. 

In  the  third  step,  items  are  developed  using  one  of  a  number  of  item- 
writing  techniques.    The  object  in  step  three  is  to  develop  a  universe 
or  domain  of  test  items  which  adequately  represent  the  instructional 
intent  as  abstractly  conceived  in  step  one.    A  number  of  methods  have 
been  proposed  and  studied  for  developing  items  (e.g.,  the  method  of  item 
forms  developed  by  Hively,  197^),  and  the  process  is  very  much  in  line 
with  the  domain  specification  approach  currently  advocated  by  leading  CR 
test  theorists  (Hambleton,  Swami nathan ,  Algina  £-  Coulson,  1978;  Hlllman, 
197^3;  Popham,  1975). 

While  these  i tem-wri ting  procedures  may  generate  items  automatically, 
Haladyna  and  Roid  (1978)  and  Hambleton,  et  al.   (1978)  have  argued  for 
processes  whereby  items  are  reviewed  either  by  logical  or  by  empirical 
means  (step  four).    These  item  reviews  are  Intended  to  identify  defective 
items  and  either  revise  or  discard  such  items  before  they  are  employed  in 
CR  testing.    Thus,  the  resultant  item  domain  is  one  in  which  logical  and 
empirical  reviews  have  been  used  to  ensure  the  quality  of  the  items. 

The  final  step  in  test  development  (step  five)   is  the  selection  of 
items  for  a  CR  test.    While  test  blueprints  and  empirical   item  selection 
techniques  have  been  advocated  for  years,  there  is  strong  evidence  that 
random  sampling  of  items  should  occur  (Hambleton,  et  al.,  1978;  Popham, 
1975;  Haladyna  s  Roid,  Note  2).     Millman  (197Ab)  provides  some  guidance 
on  types  of  random  sampling  plans  that  may  be  employed  to  provide  the 
adequate  coverage  of  the  content  desired.    The  practice  insures  a  high 
degree  of  content  validity. 

Within  the  area  of  CR  testing  there  are  many  issues  to  be  studied 
and  resolved.    These  issues  include  (a)   item  review,  (b)  reliability, 
(c)  decision  making,   (d)  standard  setting,  and  (e)  validity.     Each  of 
these  issues  becomes  the  object  of  future  study.    However,  the  present 
review  is  focused  on  the  first  three  steps  of  CR  test  development,  which 
are  related  to  item  development.    CR  tests  appear  in  a  variety  of 
educational  settings,  but  the  most  appropriate  of  these  settings  would 
seem  to  be  in  instruction  that  is  objective-based  and  systematic  in 
nature. 
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Systematic  Instruction  and  Systematic  Testing 


As  test  developers  are  aware,  a  good  CR  test  is  typically  used  in 
instruction  to  monitor  student  progress  with  respect  to  the  intent  of 
the  instruction.    There  are  a  number  of  instructional  systems,  e.g., 
mastery  learning  (Bloom,  1968);  personalized  instruction  (Keller,  I968; 
Robin,  1976)  which  treat  instruction  as  an  orderly  process  that  is  goal- 
based  and  student-centered. 

As  noted  earlier,  the  CR  tests  that  are  developed  are  created  by 
random  sampling  from  a  domain  of  items  representing  the  instructional 
intent,  as  illustrated  in  Figure  K    An  important  distinction  made  by 
Millman  (1974a)  is  that  two  types  of  CR  tests  exist— objective-based  and 
domain-based.    The  objective-based  CR  test  consists  of  items  which  were 
selected  or  written  to  reflect  a  single  objective  or  a  homogeneous  set 
of  objectives.    The  domain-based  CR  test  is  derived  from  a  specification 
of  the  domain  and  rules  for  the  development  of  items  which  do  not  permit 
a  great  deal  of  influence  by  the  item  writer  on  the  item,  thus  greatly 
eliminating  the  potential  for  item-writer  bias.    Millman  (1974a)  has 
stated  that  the  domain-based  test  is  the  purest  form  of  CR  test  and  a 
more  desirable  alternative  to  the  objective-based  test. 

Within  the  framework  of  systematic  instruction,  there  are  several 
reasons  for  the  increased  attention  to  item-writing  methods.  Foremost 
among  these  is  that  most  instructional  systems  need  large  collections  of 
CR  items  in  order  to  provide  students  with  multiple  forms  for  retests. 
When  mastery  is  not  achieved,  instructors  must  provide  suitable  remedia- 
tion and  give  retests  until  mastery  Is  achieved.    The  consequence  of  this 
strategy,  which  seems  to  be  common  to  virtually  all  forms  of  systematic 
instruction,  is  that  a  large  collection  of  test  items  must  effectively  and 
logically  represent  instruction. 

Another  reason  for  Increased  attention  to  the  development  of  items 
is  the  role  achievement  tests  play  in  research.     Educational  researchers 
often  must  construct  achievement  tests  to  be  used  as  dependent  measures 
in  their  studies.    Anderson  (1972),  in  a  classic  paper,  maintains  that 
educational  researchers  tend  to  overlook  the  basic  requirements  of  a 
system  of  measurement,  ''namely  that  there  is  a  clear  and  concise  defini- 
tion of  the  things  being  counted'*  (page  145).     This  need  can  be  extended 
to  the  area  of  evaluation  research  where  the  effectiveness  of  Instruc- 
tional programs  is  often  determined  by  CR  achievement  tests  that  are  not 
specifically  described. 

When  item  writers  create  items  for  CR  testing  using  informal  or 
subjectively  inspired  methods,  they  are  likely  to  produce  Items  which 
vary  In  quality  and  difficulty  (Bormuth,  1970).    The  use  of  objectives  or 
similar  rules  for  Item  writing  do  not  necessarily  lead  to  better  Items. 
A$  demonstrated  in  a  study  by  Roid  and  Haladyna  (1978),  the  inherent  sub- 
jectivity in  Item  writing  produces  a  bias  that  Is  difficult  to  overcome. 
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Another  reason  for  concern  with  item  development  is  that  unless 
test-item  writing  methods  are  operationally-defined,  these  methods  cannot 
be  documented  for  other  researchers  or  educators.     If  the  test- item 
writer  uses  a  mental  process  that  cannot  be  described  and  communicated 
to  another  e-ducator,  the  process  of  item  writing  remains  a  private  event 
which  is  not  defined  and,  hence,  not  replicable.    An  operationally 
defined  method  provides  a  precise  description  of  how  items  are  written 
so  that  two  independent  item  writers  using  the  same  method  produce  vir- 
tually identical   items.    And  these  items  have  an  integral  link  to 
instruction  and  a  link  to  the  intent  of  instruction. 

Given  this  background,  two  fundamental  approaches  to  CR  test-item 
writing  are  identified  and  methods  for  writing  or  classifying  items 
are  described,  and  recommendations  are  offered  for  future  research  and 
development. 


All   item-writing  methods  can  be  contrasted  using  a  continuum  which 
ranges  from  informal-subjective  to  computerized-objective  (illustrated 
in  Fi gure  2) . 


A  CLASSIFICATION  OF  ITEM  WRITING  METHODS 


1. 


I n formal  Methods 


2.    Writing  from  Learning  Objectives 


3.     Writing  from  Detai led  Learn  ing  Objectives 


A.    Writing  from  Item  Generation  Rules  with  Writer's  Choice 


5.     Writing  from  Item  Forms  or  Fully  Computerized  Methods 


Figure  2.    A  continuum  of  item-writing  methods 
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The  informal  methods  may  Involve  a  listing  of  the  topics  to  be 
covered  in  the  content  of  a  course  or  may  simply  involve  the  Instructor 
sitting  down  and  writing  items  that  are  felt  to  be  relevant  to  the 
course.    At  levels  two  and  three  of  the  continuum,  learning  objectives 
or  detailed  objectives  may  be  written  for  a  course  of  instruction,  and 
these  are  used  as  guides  in  producing  test  items.    At  the  fourth  and 
fifth  levels  of  the  continuum,  there  may  be  a  domain  specification  or 
universe  of  test  items  that  is  defined  for  a  course  of  instruction  and 
the  tests  that  are  used  with  it  (Hively,  197^;  Shoemaker,  1975).  Since 
it  IS  assumed  that  criterion-referenced  tests  are  the  appropriate  tool 
for  assessing  student  ach ieverrtent  in  systematic  instruction  and  because 
these  tests  are  developed  using  either  objectives  or  domains  as  the 
starting  point,  the  emphasis  in  this  review  will  be  on  these  two  major 
classifications  of  item-writing  methods.    The  former  subsumes  levels  two 
and  three  of  the  continuum,  while  the  latter  subsumes  levels  four  and 
f  i  ve. 


Objective-Based  Methods 

Since  the  appearance  of  Mager's  classic  text.  Preparing  Instruc- 
tional Object i ves  {K'ager,  1962),  there  have  been  a  plcthcra  of  basics 
dealing  with  the  subject.    The  purpose  in  this  section  of  the  review 
will  not  be  to  show  how  to  prepare  objectives,  but  to  evaluate  the  con- 
tribution of  objectives  to  CR  item  writing. 

Simply  stated:    "An  objective  is  an  intent  communicated  by  a 
statement  describing  a  proposed  change  in  a  learner — a  statement  of 
what  the  learner  is  to  be  like  when  he  has  successfully  completed  a 
learning  experience"  (Mager,  1962,  p.  3).    The  key  concept  in  this 
definition  is  the  "intent"  which  is  the  raison  d'etre  for  the  objective. 
Given  the  objective,  the  test-item  writer  has  a  good  idea  what  was 
intended,  and  is  guided  in  developing  CR  test  items  which  are  appro- 
priate to  this  intent.     Further,  objectives  give  organization  to  the 
content  to  be  learned  and  are  believed  to  provide  focus  to  learning 
efforts.     In  fact,  reviews  by  Duchastel  and  Merrill   (1975),  Hartley  and 
Davies  (1976)  and  by  Melton  (1978)  indicate  that  the  use  of  objectives 
does  enhance  learning,  although  the  latter  author  warned  that  the  per- 
ceived effectiveness  of  objectives  is  an  oversimplification  in  light  of 
the  conditions  that  existed  in  the  research  on  the  effectiveness  of 
objectives. 

Studies  dealing  with  item  characteristics  of  CR  tests  were  recently 
reviewed  by  Berk  (Note  3)  and  by  Haladyna  and  Roid  (1978).  These  empiri 
cal  studies,  besides  providing  a  technical  base  upon  which  item  review 
may  be  performed,  point  to  the  deficiencies  in  the  approach  where  objec- 
tives are  used  to  generate  items.  In  one  study  (Roid  &  Haladyna,  1978) , 
two  item  writers  used  the  same  learning  objectives  as  a  guide  in  prepar- 
ing items,  but  one  item  writer  was  found  to  consistently  write  more 
difficult  Items  regardless  of  the  objective.    Thus,  it  seems  that  the 
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very  same  subjectivity  and  bias  that  is  present  when  the  item  writer  uses 
his  own  intuitive  notions  can  be  present  when  objectives  are  used. 

Dissatisfaction  with  the  differences  in  items  produced  by  item 
writers  who  use  the  objectives  has  prompted  some  to  reject  objective- 
based  tests  in  favor  of  other  "purer"  forms  of  criterion-referenced 
tests,    Popham  (1978,  p.  91)  states:    "The  thrust  of  the  emerging 
criterion-referenced  measurement  technology,  therefore,  is  on  increasing 
the  capabilities  of  criterion-referenced  tests  to  produce  lucid  descrip- 
tions of  examinees  performance."    The  objective-based  CR  test  is  viewed 
as  a  weaker  form  of  a  CR  test  in  contrast  to  the  domain-based  CR  test 
(Hambleton,  etal.,  1978;  Mi  1 Iman,  197Aa). 

One  solution  to  the  problem  of  using  objectives  is  the  amplified 
objective,  which  is  an  elaboration  of  the  objective,  and  which  reduces 
uncertainty  about  the  form  and  extent  of  items  developed.    An  example  is 
provided  from  Popham  (1975>  p.   1^7)  which  shows  how  an  objective  is 
transformed  into  an  aiTipllficd  objective  (see  Figure  3):    The  process 
thereby  transforms  an  objective-based  item-writing  method  into  a  domain- 
based  method.    That  is,  the  amplified  objective  yields  a  pool  of  test 
items  with  well-defined  characteristics. 

Instructional  Qual i  ty  Inventory.    Another  approach  to  improving 
objectives  is  the  Instructional  Quality  Inventory  (IQl)  developed  by  the 
Navy  Personnel  Research  and  Development  Center,  San  Diego,  and  Courseware, 
Inc.   (Ellis,  Wulfeck  II,  Merrill,  Richards,  Schmidt  S  Wood,  Note  ^).  IQl 
provides  a  method  for  examining  the  consistency  between  test  items, 
objectives,  and  instruction.    The  IQl  uses  a  inatrix  of  test  levels  by 
content  types  that  allows  the  test  developer  to  classify  test  items  and 
objectives  in  terms  of  both  task  and  content.    This  examination  of  the 
relationship  between  objective  and  instruction  is  a  major  advance  in  the 
technology  of  item  writing,  although  the  degree  to  which  the  IQl  has  been 
successful ly_  implemented  is  undetermined.     Nevertheless ,  I Ql  provides  a 
systematic  approach  to  analyzing  objectives  which  may  allow  for  the 
creation  of  satisfactory  items. 

Classifying  educational  objectives.    Two  approaches  to  creating  and 
.classifying  Items  and  objectives  will  be  briefly  presented  and  reviewed. 
The  first  one  is  the  most  well  known,  the  cognitive  taxonomy  proposed 
by  Bloom  and  his  colleagues  (Bloom,  Engelhart,  Furst,  Hill  S  Krathwphl , 
1956).    A  second  approach  is  a  typology  introduced  by  Williams  and 
Miller  (1973). 

Bloom's  taxonomy  consists  of  six  categories  ranging  from  knowledge, 
which  deals  with  factual  recall,  to  the  highest  level,  evaluation,  which 
involves  judgment.    The  taxonomy  has  had  tremendous  impact  on  the  think- 
ing and  practices  of  educators,  and  any  discussion  of  objectives  is 
incomplete  without  reference  to  this  taxonomy.    However,  seldom  are  CR 
tests  employed  which  involve  this  cognitive  taxonomy.     In  a  recent 
review  of  the  properties  of  this  taxonomy,  Seddon  (1978,  p.  321) 
concludes,  "No  one  has  been  able  to  demonstrate  that  these  properties  do 
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Descriptive  Language:    Concrete  and  Abstract  Words  Composition  Skills 


Objective;    Given  a  sentence  with  a  noun  or  verb  omitted,  the  student 

will  select  from  two  alternatives  the  word  that  most  specifi- 
cally or  concretely  completes  the  sentence. 

# 

Sample  Item: 

Pi  rections:    Mark  an  '*X''  through  one  of  the  words  in  parentheses  that 
makes  the  sentence  describe  a  clearer  picture. 

Example;    The  racer  (tumbled,  went)  down  the  hill. 

Amplified  Objective: 

Stimulus  Elements: 

1.  The  student  will  be  given  simple  sentences  with  the  noun  or  verb 
omitted  and  will  be  asked  to  mark  an  "X''  through  the  one  word  of 
a  given  pair  of  alternative  words  that  more  specifically  or  con- 
cretely completes  the  sentence. 

2.  Each  test  will  omit  nouns  and  verbs  in  approximately  equal  numbers. 

3.  Vocabulary  will  be  familiar  to  a  third-  or  fourth-grade  pupil. 
Response  Alternatives: 

1.  The  student  will  be  given  pairs  of  nouns  or  pairs  of  verbs  with 
distinctly  varied  degrees  of  descriptive  power. 

2.  In  pairs  of  verbs,  one  verb  will  either  be  a  linking  verb  or  an 
active  verb  descriptive  of  general  action  (e.g.,  is,  goes),  and 
one  verb  wi 1 1  be  an  action  verb  descripti ve  of  the  manner  of 
movement  involved  (e.g. ,  scrambled,  skipped) . 

3.  In  pairs  of  nouns,  one  noun  will  be  abstract  or  vague  (e.g.,  man, 
thing),  and  one  noun  will  be  concrete  (e.g.,  carpenter,  computer). 

Criterion  of  Correctness; 

The  correct  answer  will  be  "X"  marked  through  the  more  concrete,  spe- 
cific noun  or  through  the  more  descriptive  action  verb  in  each  pair. 

Figure  3:    Example  of  an  amplified  objective. 
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not  exist.    Conversely,  no  one  has  been  able  to  demonstrate  that  they 
do.'*    Thus,  the  utility  of  Bloom's  taxonomy  as  a  tool  for  CR  test  devel- 
opers is  still  questionable. 

A  typology  for  test  questions  was  originally  introduced  by  Williams 
and  Miller  (1973)  which  viewed  objectives  and  items  particularly  as  rep- 
resentations of  one  of  five  possible  types.    The  term  ''typology**  was 
used  as  no  order  for  these  categories  was  implied.    A  fuller  treatment 
of  this  work  (Miller,  Williams  &  Haladyna,  1973)  reveals  a  system  much 
like  Bloom's  which,  instead,  focuses  on  verbs  in  test  questions  as  keys 
to  interpreting  the  cognitive  category  of  behavior  in  which  a  test  item 
falls.    The  categories  include  factual  recall,  summarizing,  predicting, 
evaluating,  and  applying.    The  example  in  Figure  A  provides  a  brief 
definition  of  each  level  and  examples  of  questions  at  each  level. 
Williams  (1977)  added  another  category  to  this  typology,  instantiation, 
which  is  a  derivative  of  summarizing.    His  empirical  study  of  the 
typology  revealed  that  students  with  a  minimum  of  training  could  classify 
test  items  with  a  high  degree  of  accuracy.    Like  other  systems  for 
Classifying  objectives  and  items,  there  is  no  empirical  research  to  sup- 
port its  use,  other  than  Williams'  study,  and  its  usefulness  at  various 
educational  levels  and  content  areas  is  unknown. 

ITEM  DEVELOPMENT  BASED  ON  DOMAIN  SPECIFICATION 


The  concept  of  "domain- referenced"  testing  was  first  reported  in 
196b  (Hively,  Patterson  &  Page,  1968;  Osburn,  1968)  and  further  developed 
by  Hively  and  his  colleagues  (Hively,  Maxwell,  Rabehl  ,  Sens  ion  &  Lundin, 
1973)-    As  we  reported  earlier,  domain-based  CR  tests  are  derived  from 
content  specifications  as  opposed  to  objective-based  tests  which  are 
derived  from  instructional  objectives. 

'      A  new  technology  of  domain  specification  provides  an  alternative  to 
objective-based  item-writing  methods.    There  are  at  least  five  distinctly 
different  approaches  to  item  creation  which  involve  domain  specifications. 
These  include:     (a)  item  forms,  (b)  linguistic-based  approaches, 
(c)  facet  theory,   (d)  concept-abased  testing,  and  (e)  computer-based 
methods.    Each  is  described,  and  research  is  reviewed  which  bears  on  the 
success  with  which  the  method  has  been  employed  in  CR  testing. 


Item  Forms 

Items  for  domain-based  tests  may  be  written  from  specifications 
that  describe  the  format  and  even  some  of  the  wording  of  the  resulting 
Items.    The  specifications  are  called  "item  forms"  (Hively,  197^),  and 
the  pool  of  items  that  an  item  form  creates  is  the  domain  to  be  assessed. 

"An  item  form,*'  explains  Osburn  (1968,  p.  97),  "has  the  following 
characteristics:  •  (1)   it  generates  items  with  a  fixed  syntactical 


1044  iOo-' 


Cfion  i  M  UP 

»YPe 

uvi in  1 iion 

Syntactical 

Fornix 

Examole  of  a  Question 

Factual 

The  reproduction  of  a  stimulus  element 

Name 

When  did  Columbus  discover  America? 

Recall 

exactly  as  it  was  presented* 

State 

a. 

Describe 

b.  m 

c.  1776 

Sumnartzlng 


The  understanding  of  concepts  and  the  . 
tendency  to  correctly  identify  examples, 
instanceSi  or  attributes  of  the  concept. 


Identify 

Define 

Translate 

Typify 

Represent 

Describe 


What  Is  a  good  example  of  alliteration? 

a.  gurgling 

b.  school  -  pool 

c.  blue  *  blood 

d.  up  '  down 


Predicting 


The  use  of  rules  in  contingent  relation- 
ships. The  student  is  given  a  situation 
and  must  anticipate  a  consequence  which 
is  based  on  a  rule. 


If.,.,  then.... 


If  the  temperature  of  the  fluid  In  the  flask 
exceeds  100'  C,  then 

a.  all  fluids  will  evaporate, 

b.  the  mixture  will  explode. 
c«  nothing  will  happen. 


Evaluating 


The  tendency  to  (a)  select  a  criterion  or 
criteria,  (b)  to  ^^^^  a  criterion,  or 
(c)  both  select  and  use  a  criterion  for 
a  decision. 


Which  is 
best,  worst; 
highest,  lowest; 
niostr  least.*.? 


From  the  standpoint  of  efficiency^  which 
procedure  Is  best? 

a.  driggllng 

b.  harpoling 

c.  craterllng 

d.  quarbollng 


Applying        Problem  solving  which  involves  the       to*'     No  s 
of  applying  involves  (a)  sensing  the 
problem,  (b)  defining  the  problem, 
(c)  selecting  principles,  rules,  or 
methods  by  which  the  problem  is  solved, 
and  (d)  selecting  or  generating  solutions. 


forms.    To  achieve  a  well-balanced  city  water  system, 
which  plan  will  provide  a  steady  supply  of 
water  In  all  seasons? 

a.  a  deep  well  system  west  of  town 

b.  a  deep  well  system  east  of  town 

c.  a  reserv^  r  In  west  hills 

d.  a  pipeline  from  neighboring 
Independence 


Figure  'l.  Definition,  syntactical  structure  of  questions,  and  examples  of  a  cognitive  typology  of  CR  test  Items. 
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structure;   (2)   it  contains  one  or  more  variable  elements;  and  (3)  it 
defines  a  class  of  item  sentences  by  specifying  the  replacement  sets  for 
the  variable  elements.*'    The  item  forms  developed  by  Hively  and  Osburn 
were  in  science  or  mathematics.     For  example,  the  following  is  an  item 
form  for  a  basic  mathematics  concept: 

Item  Wording:    Which  of  the  following  numbers  is  a  prime  number? 

(a),  (b),  (c),  (d) 

Elements  to  Complete  the  Item:     Four  numbers,   (a)  to  (d)  are 

provided  and  the  student  is  required  to  check  the 
one  that  is  the  prime  number.     These  numbers  must 
be  two  or  three-digit  integers,  and  all  must  be 
odd  numbers.     The  foils  must  be  non-trivial  as 
defined  by  the  fact  that  they  should  be  factorable 
into  a  minimum  of  3  factors. 

Sample  Item:      Which  of  the  following  numbers  is  a  prime  number? 

27,  31,  1A7,  189 

Correct  Answer:  31 

No  studies  have  been  observed  that  deal  with  the  feasibility  of 
Item  forms  or  empirical  comparisons  between  this  approach  and  others. 
Hively,  Patterson  and  Page  (1968)  studied  the  empirical  properties  of 
Items  developed  from  item  forms  using  Cronbach's  general i zab i 1 i ty  theory 
(Cronbach,  Gleser,  Nanda  5  Rajaratnam,  1972).     Results  of  the  study  were 
promising  in  that  items  were  produced  that  showed  response  patterns  that 
suggested  distinct  and  homogeneous  classes  of  behavior. 

The  most  significant  work  to  date  on  item  forms  was  the  five-year 
cooperative  project,  MINNEMAST  (Hively,  et  al.,  1973).    The  monograph 
documents  a  domain-based  test  development  from  item  forms,  providing  a 
rich  resource  of  examples  and  problems  encountered.     Foremost  among 
these  problems  is  the  extraordinary  cost  in  the  development  of  item 
forms  and  the  administration  and  scoring  of  test  items,  many  of  which 
were  not  machine  scorable. 

Further,  there  are  concerns  expressed  for  the  feasibility  of  such  an 
approach  (Popham,  1975,  p.  136).     Until   item  forms  can  be  made  more 
efficient,  their  potential  may  be  limited  to  subject  matter  that  is  more 
objectively  structured  and  identifiable. 

Millman  and  Outlaw  (1977)  have  recently  implemented  item  forms  in 
the  tests  used  for  several  college  courses.     They  have  developed  a  spe- 
cial programming  language  for  a  smal 1 -computer  system  that  allows  an 
item  writer  to  construct  an  "item  program."    This  is  a  computer  program 
that  directs  the  system  to  produce  mul tiple  questions.     The  item  program 
defines  a  structure  for  each  question.    Most  of  the  wording  of  the  Item 
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can  be  fixed  and  parts  of  the  item  can  be  variables  that  are  replaced  to 
create  unique  questions.    Variable  elements  can  be  words  or  random  numbers 
or  quantities  that  are  mathematically  computed.    An  example  of  a  very 
simple  Item  form  and  an  item  program  for  a  math  problem  is  shown  in 
Figure  5.    One  advantage  of  this  system  is  that  only  the  item  program 
needs  to  be  stored,  not  the  individual  items.    A  test  can  be  constructed 
and  printed  by  the  computer. 


Item  Form  Item  Program 

"How  much  is  (X)  plus  (Y)?"  10  LET  X  =  RANDOM  (1,  lO) 

Where  X  and  Y  are  integers  20  LET  Y  =  RANDOM  (1,  lO) 
from  1  to  10. 


30  QUESTION  CONTENT  "  How  much 

is        X,  "  plus       Y,  "  ?  " 

kO  ANSWER  CONTENT    X  +  Y 


Figure  5.     Item  form  and  Item  program  from 
Mi llman  and  Outlaw  (1977) 


Linguistic-Based  Approaches 

Bormuth  (1970)  was  the  first  to  describe  a  technology  of  item 
writing  for  assessing  learning  from  prose  material.    He  described  rules 
that  are  a  series  of  directions  which  tell  an  item  writer  how  to  trans- 
form segments  of  prose  instruction  into  questions.    Bormuth  outlined  two 
types  of  transformations:     (a)  items  derived  from  sentences  and  (b) 
Items  derived  from  the  relationships  between  sentences  (1970,  pp.  39-55). 
An  example  of  sentence-derived  items  that  assess  the  recall  of  prose 
material  are  those  created  by  the  "wh-transformatlon."    These  items  would 
be  written  using  a  detailed  set  of  rules  summarized  as  follows:  "Select 
sentences  from  the  instructional  materials,  replacing  a  'wh'  word  such 
as  who,  what,  or  where  for  the  appropriate  part,  e.g.,  subject  noun.  In 
each  sentence."    For  instance,  "The  test  developer  computes  the  validity 
coefficient,"  could  be  transformed  to:    "Who  computes  the  validity 
coefficient?"    These  are  particularly  useful  because  they  can  be  written 
to  assess  learhing  of  each  of  several  ideas  in  one  sentence,  and  can  be 
made  into  either  completion  or  multiple-choice  format. 

Through  the  use  of  paraphrasing,  sentence-derived  items  can  also  be 
written  to  test  comprehension  of  prose  material.    Anderson  (1972)  has 
emphasized  the  importance  of  paraphrasing  and  has  defined  It  as  the  case 
where  (a)  all  substantive  words  in  a  sentence  are  replaced  and  (b)  the 
original  and  paraphrased  sentences  have  equivalent  meaning. 
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Questions  can  be  developed  from  the  relations  between  sentences; 
for  example,  by  questioning  the  cause  of  an  action  described  in  a  prose 
passage.    For  Instance,  the  sentences  (a)  Jim  hurt  his  hand,  (b)  He  was 
cleaning  his  knife,  and  (c)  His  knife  accidentally  slipped,  can  be 
examined  for  implied  causation,  resulting  in  the  question,  "What  caused 
Jim's  hurt  hand?"  (Bormuth,  1970,  p.  5^) . 

Finn  (1975;  Note  5)  and  Roid  ard  Haladyna  (1978;  Note  6)  havfe  ex- 
tended the  work  of  Bormuth  by  developing  multiple-choice  item-writing 
methods  for  prose  learning.    Finn's  original  work  (1975)  involved  a 
rather  lengthy,  82-step  algorithm.    A  streamlined  version  of  this  algo- 
rithm was  developed  by  Roid  and  Finn  (Note  7)  and  included  the  following 
important  steps: 

1 .  Analyzing  the  text; 

2.  Selecting  sentences  by  keywords; 

3.  Transformation  of  sentences  into  questions:  and 

A.  Generation  of  foils  for  multiple-choice  format. 

Analyzing  the  text.    To  develop  questions  that  measure  important 
aspects  of  a  prose  passage  requires  a  selective  screening  of  the  text. 
One  approach  to  screening  the  prose  material  is  to  use  a  team  of  teachers 
or  curriculum  experts  to  identify  the  "instruct ional ly  relevant"  sen- 
tences.   Many  instructional  programs  include  sentences  that  are  direc- 
tions to  the  student,  references  to  illustrations,  or  other  verbal  infor- 
mation that  is  not  directly  related  to  the  learning  objectives  for  the 
program.     Screening  of  this  material  by  consensus  would  be  essential  for 
the  creation  of  meaningful,  relevant  items. 

Another  approach  to  text  analysis  was  proposed  by  Finn  (Note  5) f 
who  used  a  word- frequency  analysis  of  a  prose  passage.    A  prose  passage 
is  screened  by  (a)  counting  the  number  of  times  that  each  noun  or  adjec- 
tive appears  in  the  passage  and  (b)   identifying  the  standard  frequency 
index  of  each  noun  or  adjective.    The  standard  frequency  index  of  each 
word  is  a  numerical  estimate  of  how  often  the  word  occurs  in  a  large 
sample  of  words  from  American  textbooks  (Carroll,  Davies  &  Richman,  1971). 
The  Carroll,  Davies  and  Richman  book  or  its  computer- tape  version  can  be 
used  to  get  the  standard  frequency  index  of  each  word  in  the  passage. 
The  word  "the"  has  the  highest  standard  frequency  index  of  any  word, 
because  the  average  American  student  is  likely  to  encounter  the  word 
"the"  once  in  every  ten  words  in  a  textbook.    The  word  "incarnation," 
for  example,  has  the  lowest  index  because  the  average  student  is  likely 
to  encounter  this  word  less  often  than  once  in  every  billion  words. 

Selecting  sentences  by  keywords.    One  approach  to  identifying  the 
important  sentences  in  a  passage  would  be  to  have  instructors  or  content 
experts  underline  the  key  sentences  in  the  passage.    A  consensus  of 
markings  can  be  used  to  identify  the  most  important  sentences.     If  this 
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consensus  Is  based  on  learning  objectives  for  the  prose  material,  the 
method  becomes  objective-based.    The  sentence-transformation  methods  to 
be  desc. ibed  are,  therefore,  domain-based  for  two  reasons:     (a)  sentences 
can  be  randomly  sampled  and  (b)  transformations  can  be  operationally 
defined. 

The  approach  to  identifying  keywords  in  prose  proposed  by  Finn 
(Note  5)  is  to  identify  '*high  information'*  words--words  that  are  rela- 
tively rare  in  American  English  and  occur  infrequently  !n  the  passage. 
The  sentences  in  which  these  high  information  words  occur  can  then  be 
sampled  for  transformation  into  items  which  assess  important  information 
in  the  passage.    Degree  of  Information  in  this  context  is  measured  by 
the  amount  of  uncertainty  In  the  meaning  of  a  sentence  that  is  created  if 
a  word  is  deleted.    High  information  words  are  those  that  are  difficult 
for  students  to  guess  if  they  are  deleted  from  sentences  such  as  is  done 
in  a  Cloze  test  (Culhane,  1970).    Cloze  tests  are  completion  tests  in 
which  every  fifth  word  has  been  deleted  from  a  prose  passage.    The  task 
for  the  student,  then,  is  to  fill  in  the  missing  words.    The  easiness 
with  v.'h5ch  s  word  is  guessed  by  a  student  is  a  measure  of  the  amount  of 
information  it  provides.    The  task  in  a  Cloze  test  is  similar  to  the 
exemplary  problem  In  information  theory  (Shannon  S  Weaver,  19^9)  where  a 
person  is  receiving  a  message,  but  because  of  noise  on  the  channel,  he 
is  not  always  sure  which  message  he  hears  (Finn,  Note  8).    The  informa- 
tion in  a  garbled  message  is  a  function  of  the  amount  of  doubt  the 
receiver  has  about  its  meaning  and  is  related  to  the  probability  of 
occurrence  of  certain  words  or  letters.    A  missing  word  which  is  a  common 
word  In  the  English  language  would  give  less  information,  because 
students  would  more  easily  guess  that  it  completes  a  sentence. 

Finn  (Note  8)  has  shown  that  the  easiness  with  which  a  word  Is 
guessed  on  a  Cloze  test  is  predicted  by  two  Important  measures  derived 
from  a  word-frequency  analysis  of  a  passage:     (1)  the  standard  frequency 
Jndex  and^  (2)  text  frequency.    Words  that  have  a  low  stand^^^ 
index  (infrequent  In  American  textbooks)  are  defined  as  high  in  infor- 
mation.   However,  there  is  one  case  in  which  the  Information  of  these 
words  is  reduced  in  relation  to  a  given  passage.     If  the  word  is  repeated 
frequently  (I.e.,  It  has  a  high  text  frequency),  the  Information  value  of 
that  word  is  reduced  and  students  will  guess  it  more  often  In  a  Cloze 
test  following  reading  of  the  passage.     In  other  words,  repetition  of  a 
word,  even  if  it  is  rare  in  American  English,  lowers  Its  Information 
value.    Candidates  for  good  question  words  are  those  which  are  both  rare 
In  American  English  (have  a  low  standard  frequency  Index)  and  occur 
infrequently  In  a  prose  passage. 

Not  all  parts  of  speech  are  equally  good  question  words,  even  though 
they  may  be  high  Information  words.    Verbs  and  adverbs,  in  particular, 
require  difficult  transformations  when  removed  from  a  sentence.  For 
example,  the  sentence,  "Finn  argued  the  point  made  by  Bormuth,"  when 
transformed  to  "What  did  Finn  do  to  the  point  made  by  Bormuth?**  seems 
clumsy  and  seems  to  be  a  less  important  question  than  the  question  ''Who 
argued  the  point  made  by  Bormuth?"    According  to  Roid  and  Finn  (Note  7), 
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the  most  promising  parts  of  speech  are  adjectives  and  nouns,  or  phrases 
that  contain  them. 


Transformation  of  sentences  into  questions.    Once  an  important  word 
and  sentence  have  been  identified  for  a  question,  the  sentence  must  be 
examined  and  prepared  for  transformation.     Some  sentences  include  refer- 
ences to  previous  sentences,  e.g.,  ''This  implies  that  .  .  .         A  phrase 
from  the  previous  sentence  must  be  inserted  into  the  place  of  the 
referent  (e.g.,  in  place  of  '*This'0.    Also,  sentences  that  are  compound 
or  that  contain  long  clauses  that  introduce  more  than  one  idea  into  the 
sentence  need  to  be  separated.    The  portion  of  the  sentence  containing 
the  question  word  Is  separated  and  used  by  itself  if  possible  (see  Finn, 
Note  5,  for  guidelines). 

The  next  step  is  to  eliminate  the  question  word  and  to  transform  the 
sentence  into  a  question.    The  question  word,  usually  an  adjective,  a 
noun,  or  its  phrase,  is  removed  and  is  replaced  with  a  wh-word.  Where 
several  wordings  are  possible,  <in  attempt  is  made  to  stay  as  close  to  the 
wording  of  the  original  sentence  as  possible*  unless  paraphrasing  Is 
being  used. 

Sentence  transformations  do  not  produce  100^  agreement  among  item 
writers  in  all  cases  because  of  such  chings  as  the  replacement  of  phrases 
from  previous  sentences.     Finn  (1975,  PP.  357-363)  discusses  some  of  the 
discrepancies  among  item  writers.    One  study  (Roid,  Haladyna,  Shaughnessy 
&  Finn,  Note  3)  showed  that  differences  between  item  writers  were  not 
statistically  significant  when  the  Finn  method  was  used. 

Generation  of  foils.    As  is  common  knowledge  among  item  writers,  the 
writing  of  good  foils  for  multiple-choice  questions  is  challenging.  The 
first  step  in  an  algorithmic  generation  of  foils  is  to  classify  the  ques- 
tion words  so  that  possible  foils  can  be  obtained  from  a  list  of  words  in 
the  same  classification.    The  most  logical  source,  of  foi  l  words  wou I d 
seem  to  be  from  the  prose  pp'  -age  itself. 

Roid  and  his  colleagues  (Roid  &  Finn,  Note  7;  Roid,  Haladyna, 
Shaughnessy  &  Finn,  Note  9)  developed  a  technique  for  algorithmic  foil 
construction.    The  algorithm  uses  words  from  the  prose  passage  itself  as 
foils.    Two  variations  of  the  algorithm  were  developed:    one  for  nouns 
and  one  for  adjectives. 

In  the  case  of  nouns,  those  with  a  standard  frequency  Index  of  60 
or  less  were  semantical ly  classified  using  the  method  of  Frederiksen 
(1975) •     For  example,  some  nouns  were  classified  as  concrete  inanimate 
nouns.     For  a  given  question  word,  a  random  sample  of  three  other  nouns 
from  the  passage  that  were  similarly  classified  were  drawn  to  create 


In  the  case  of  adjectives,  research  on  semantic  differential  tech- 
nique was  used  as  a  basis  for  classifying  adjectives  from  the  passage 
(Nunnally,  I967,  PP.  536-638).     In  semantic  differential  research,  three 


foils. 
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factors  are  usually  Identified:     (1)  evaluation,  such  as  "good,"  "bad," 
etc.,  (2)  potency,  such  as  "strong,"  "weak,"  etc. ,  and  (3)  activity,  such 
as  ••quick,"  "slow,"  etc.     In  addition,  Nunnally  has  defined  a  fourth 
factor,  "fami  1  larlty,"  such  as  "simple"  or  "complex."    These  four  factors 
were  used  to  classify  the  adjectives  in  the  passage  that  had  standard 
frequency  indexes  of  less  than  60.    As  a  further  screen  of  the  adjectives, 
the  Dale-Chall  List  of  3,000  Familiar  Words  (Dale  6  Chall,  1948)  was  used. 
The  adjective  needed  to  be  absent  from  that  list  so  that  extremely  common 
adjectives  could  be  eliminated.    These  common  adjectives  were  suspected 
to  be  too  easy  as  foils. 

Research  on  these  linguistic-based  methods  for  generating  Items  has 
been  mainly  limited  to  a  series  of  studies  which  trace  the  development  of 
a  method  for  creating  appropriate  multiple-choice  CR  test  Items.  The 
first  of  these  studies,  as  described  earlier,  contrasted  objective-based 
and  Informal  methods  of  Item  writing  (Roid  £  Haladyna,  1978).  The 
objective-based  method  was  more  In  keeping  with  the  amplified  objectives 
approach.    The  results  of  this  study  showed  that  while  objectives  pro- 
vided guidelines  In  the  preparotion  of  items,  one  item  writer's  items 
were  about  ]0%  more  difficult  than  the  other's.    The  consequence  of 
developing  CR  test  forms  based  on  any  particular  item  writer  can  be  great 
with  respect  to  misclassi fylng  student  examinee  performance  as  adequate 
or  inadequate,  as  Is  typically  done  In  many  forms  of  systematic  Instruc- 
tion.   Thus,  Item-writer  bias  Is  a  phenomenon  that  affects  the  difficulty, 
if  not  the  quality,  of  CR  test  items  produced  Informally  or  with  objec- 
tives.   This  study  suggests  that  CR  test-item  writers  should  not  proceed 
from  step  one,  conceptualization,  to  step  3,  item  writing,  or  use  loosely- 
stated  objectives,  as  suggested  in  step  2. 

In  the  second  of  this  series  of  studies,  Roid  and  Haladyna  (Note  6) 
examined  the  effects  of  variations  In  linguistic-based  algorithms  on 
item  characteristics  including  instructional  sensitivity,  a  criterion 
measurebased  on  the  tendency  for  items  to  exhibit  change  jn      ^'f  •  _ 
as  a  function  of  instruction.'    Four  item  writers  were" compared  on  two 
methods  of  selecting  sentences,  two  types  of  question  words,  and  two  foil 
construction  methods.    No  significant  differences  between  Item  writers 
were  found  on  item  difficulties,  indicating  an  absence  of  item-writer 
bias.    Keyword  nouns,  which  are  relatively  rare  words  in  American  text- 
books that  appear  frequently  in  a  prose  passage,  were  found  to  be  unac- 
ceptable as  question  words.    Algorithmic  methods  of  foil  writing  were 
found  to  be  feasible.    Thus,  this  study  Indicated  that  item-writer  bias 
could  be  eliminated  through  the  use  of  certain  rules  dealing  with  the  way 
sentences  are  identified  and  transformed  Into  multiple-choice  questions. 
The  study  also  points  to  a  need  for  further  work  that  is  needed  in 


'  See  Haladyna  (197A)  and  Haladyna  and  Roid  (1978)  for  fuller  discussions 
of  this  characteristic  and  measures  of  It. 


the  refinement  of  the  algorithms  in  an  effort  to  achieve  more  fully  auto- 
mated or  even  computerized  procedures* 

In  a  more  recent  study  in  this  line  of  research,  Roid,  Haladyna, 
Shaughnessy  and  Finn  (Note  9)  examined  some  of  the  refinements  developed 
as  a  result  of  the  previous  study  in  contrast  to  methods  of  item  writing 
that  are  based  on  paraphrasing  of  keywords.     It  was  found  that  passages 
with  greater  density  (i.e.,  sections  that  provided  more  information) 
yielded  harder  items.    More  importantly,  letting  item  writers  have  more 
freedom  in  the  selection  of  foils  produced  better  items  based  on  the 
criterion  of  instructional  sensitivity.    Verbatim  transformations  led  to 
Items  with  higher  instructional  sensitivity.    The  procedures  examined 
also  led  to  greater  control  of  item  difficulty  than  previously  observed. 
This  study  documents  some  of  the  intriguing  advantages  of.  algori  thmic 
multiple-choice  item-writing  methods,  and  this  study  also  points  to  the 
need  to  further  study  and  improve  item-generating  techniques  based  on 
Bormuth's  theory  of  achievement  testing.    The  goal  of  reducing  item- 
writer  bias  was  effectively  achieved,  and  future  work  will  concentrate 
on  making  the  process  morie  cost  effective  and  efficient. 


Facet  Theory 

Structural  facet  theory  (Foa,  1958)  has  existed  for  some  time  and 
has  mainly  served  as  a  research  tool,  particularly  in  the  area  of  atti- 
tude measurement.    Only  recently  have  there  been  applications  to  CR  test 
construction  (Engel  &  Martuza,  Note  10;  Berk,  Note  3).    The  purpose  of 
facet  theory  is  to  provide  the  structure  and  boundaries  of  a  domain  of 
testing  conditions.    For  this  reason,  facet  theory  is  viewed  as  a  method 
for  developing  a  population  of  items  representing  the  domain  of  instruc- 
tion.   The  primary  advantage  of  facet  theory  is  that  the  analysis  of 
content  has  semantic  meaning  in  a  theoretical  sense,  and  there  is  no  need 
to  conduct  empi ri cal  analyses  to  search  for  meaning.    The -logical -analysis 
of  sentences  leads  to  meaningful  test  items  which  are  easily  Interpretable 
That  is  not  to  say  that  empirical  observation  is  unnecessary  in  facet 
theory,  but  that  the  theory  a  priori  specifies  the  nature  of  the  material 
to  be  learned  and  tested.    Thus,  facet  theory,  like  other  similar  ap- 
proachesy  allows  for  an  objective  specification  of  the  domain  which  is 
the  target  of  instruction. 

* 

Facet  theory  specifies  the  limits  of  the  domain  and  the  orderings 
of  its  subparts.     In  the  theory,  two  aspects  are  hypothesized:  content 
and  statistical.    With  content,  the  domain  is  specified  using  a  semantic 
structure  called  the  '^mapping  sentence."    The  content  structure  is  the 


The  *source  of  the  information  presented  here  was  mainly  derived  from 
excellent  presentations  of  facet  theory  by  Engel  and  Martuza  (Note  10) 
and  by  Berk  (Ndte  3).    The  reader  is  referred  to  these  original  sources 
for  a  more  detailed  treatment  of  facet  theory  for  CR  tests. 
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framework  for  predicting  the  statistical  structure,  which  is  later  tested 
using  observations,  thus  enabling  the  user  to  relate  theory  to  observa- 
tion.    In  the  context  of  achievement  testing,  the  mapping  sentence  Is  a 
mechanism  for  defining  a  content  domain  and  a  related  set  of  test  Items 
to  measure  achievement  in  that  domain. 

The  mapping  sentence  has  fixed  and  variable  parts  resembling  an 
item  form  shell.    Parts  of  the  sentence  called  "facets"  are  Identified 
which  represent  some  specific  information  for  testing,  and  the  facet 
elements  operate  in  much  the  same  manner  that  replacement  sets  operate 
in  Item  forms.    The  sum  of  all  desirable  patterns  of  facets  constitutes 
a  facet  design. 

Using  an  example  provided  by  Berk  (Note  3>  P*  2): 

A.    IX^maln  Specification  Strategy 

I.    Sentence  transformation 


2.  Item  forms 

3.  Algorithms 

k.  Mapping  sentences 
B.    Content  Domain 

1.  Reading 

2.  Language 

3*  Mathemati  cs 

A.  Science 

5*  Social  studies 


are  most  appropriate 
for  defining 


these  content  domains 


The  mapping  sentence  for  this  example  would  have  two  facets.    The  elements 
of  each  facet  are  ordered  in  some  meaningful  way.    There  is  a  set  of 
rules  for  choosing  facets  and  their  elements  (McGrath,  1967).    These  rules 
are  summarized  from  Berk's  paper  (Note  3): 

1.  Objects  should  be  classified  by  all  properties  or  facets. 

2.  Each  facet  should  be  divided  into  an  exhaustive  set  of  cate- 
gories or  elements. 

3.  The  elements  should  be  mutually  exclusive;  that  is,  each  element 
is  classifiable  Into  one  and  only  one  category. 
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k.    The  logical  relationship  among  elements  of  a  facet  should  be 
spec! f iable. 


5.    The  logical  relationship  among  facets  should  be  specified, 

6«    The  facets  should  exhaust  the  domain  of  interest. 

From  the  example  given  above,  there  are  a  total  of  twenty  combina- 
tions of  elements  from  Facet  A  and  Facet  B,    Thus,  twenty  statements 
exhaust  the  domain  of  possibilities,  and  twenty  true- false  items  are 
possible.     For  example:    Sentence  transformation  is  most  appropriate  for 
defining  mathematics.    This  is  A1B3,  which  is  a  false  statement. 

This  particular  facet  design  is  useful  for  true-false  or  completion 
formats.    Building  a  multiple-choice  domain  of  items  is  considerably 
more  complex.     Besides  identifying  the  correct  answer  for  a  particular 
facet,  the  additional  burden  is  to  produce  three  or  four  plausible  foils. 
This  process,  described  by  Berk  (Note  3),  involves  a  logical  analysis  of 
potential  dibtfactors  which  ere  drawn  from  the  elements  of  the  facets 
Using  the  previous  example  with  some  modification: 

Item  transformations  are  most  appropriate  for  defining: 


The  benefits  of  facet  designs,  as  well  as  other  similar  approaches, 
were  discussed  by  Engel  and  Martuza  (Note  lo): 

1.  Both  item  stem  and  foils  can  be  systematically  constructed. 

2.  Facet  design  is  based  on  a  theory  of  content  and  how  content 
is  defined.    Therefore,  the  identification  of  foils  occurs  in  the  con- 
text of  how  foils  are  more  or  less  attractive  as  incorrect  responses.  As 
a  consequence,   incorrect  responses  have  meaningful  interpretations  in  a 
diagnostic  vein. 

3.  The  procedures  provide  a  logical  connection  between  content  and 
the  multiple-choice  item. 

4.  Items  produced  may  be  logically  compared  with  respect  to  con- 
tent difficulty  and  appropriateness,  thus  making  the  construction  of 
parallel  test  forms  easier  and  less  subject  to  capri ciousness  which 
exists  when  random  sampling  is  used  to  create  items. 

As  noted  earlier,  facet  theory  is  a  relatively  young  field  of  con- 
tent specification  for  domain-based  CR  tests.    There  is  very  little  known 
about  its  applicability  to  various  subject  matters  or  the  empirical 
characteristics  bf  tests  constructed  using  facet  designs. 


c. 


a. 
b. 


d. 


social  studies 
language 
reading 
mathemati  cs 
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Engel  and  Martuza  (Note  10)  conducted  an  empirical  study  of  facet 
designs.    The  procedures  led  to  functionally  equivalent  parallel  forms 
tests.    Further,  results  indicated  that  the  method  works  equally  well 
with  highly  structured  material  like  mathematics  as  well  as  more  abstract 
material.    Finally,  the  study  showed  the  feasibility  of  facet  theory  as 
a  fnethod  for  domaJn-based  CR  test-Item  construction.     It  is  also  inter- 
esting to  note  that  like  amplified  objectives,  the  facet  design  can  be 
used  with  an  objective  as  the  mapping  sentence,  thereby  capitalizing  on 
existing  objectives. 

Like  other  approaches  to  domain-based  testing,  face:  theory  appears 
quite  promising.    These  seminal  works  by  Engel  and  Martuza  (Note  lo)  and 
Berk  (Note  3)  provide  a  clear  picture  of  the  nature  and  potential  of 
facet  theory.    There  remains,  however,  much  work  to  be  done  to  refine  the 
theory  and  to  apply  it  to  various  subject  matters  as  well  as  to  compare 
it  to  other  domain-based  approaches  in  an  effort  to  uncover  which  set  of 
procedures  is  most  efficieni:,  feasible,  and  defensible  in  light  of  the 
abstract  conceptualization  of  instructional  intent  posited  as  the  first 
step  in  CR  test  development.    As  with  other  approaches,  more  development 
coupled  with  empirical  research  should  reveal  the  utility  of  facet  theory 
and  Its  eventual  role  in  CR  testing. 


Concept-Based  Testing 

The  work  of  Markle  and  Tiemann  on  the  teaching  and  testing  of  con- 
cepts can  be  used  to  create  domain-based  tests  that  go  beyond  the  factual 
level  of  learning  (Markle,  1975;  Markle  &  Tiemann,  1974;  Tiemann  S  Markle 
1978;  Tiemann,  Kroeker  S  Markle,  Note  II;  Tiemann  S  narkle.  Note  12). 
They  have  defined  concepts  as  classes  of  objects,  events,  or  relations 
which  vary  among  themselves  and  yet  are  all  grouped  together  and  called 
by  the  same  name.    A  student's  understanding  of  a  concept  is  tested  by 
checking  for  generalization  to  new  examples  and  discrimination  of  non- 
examples.    A  set  of  examples  and  nonexamples  that  are  different  from 
those  used  in  teaching  are  used  to  test  the  student's  understanding  of 
the  concept.     If  we  were  teaching  the  concept  "chair,"  we  might  use  the 
examples  of  a  metal  kitchen  chair  and  an  upholstered  chair  and  the  non- 
examples  of  a  stool  and  a  church  pew  in  the  teaching  exercise.     In  test- 
ing for  understanding  of  the  concept  of  chair  we  might  use  the  examples 
of  a  rocking  chair  and  a  rattan  chair,  and  the  nonexamples  of  a  bench 
and  a  love-seat. 

Tiemann  and  Markle  (1978)  provide  guidelines  and  many  practical 
examples  of  the  analysis  of  concepts^.    The  analysis  of  concepts  Involves 
listing  the  variable  and  critical  attributes  of  the  concept.    A  variable 
attribute  Is  a  property  of  any  particular  example  which  can  be  varied 
without  changing  an  example  to  a  nonexample.    For  instance,  the  number 
of  legs  Is  variable  in  the  concept  chair,  because  we  can  have  a  modernis- 
tic chair  with  a  pedestal  or  a  standard  four-legged  chair.  Critical 
attributes  are  true  for  every  example  of  the  concept,  and  If  they  are 
removed,  the  example  becomes  a  nonexample.    For  instance,  the  require- 
ments of  a  "single-person  seat,"  a  back  and  a  rigid  seat,  are  the 
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critical  attributes  of  chair.    Variable  attributes  are  whether  it  has 
rockers,  arms,  the  material  it  is  made  of,  etc.    After  the  critical 
attributes  and  variable  attributes  have  been  listed,  and  lists  of 
examples  and  nonexamples  have  been  written,  it  then  becomes  possible  to 
construct  domain-based  criterion-referenced  tests  for  a  given  concept. 
Such  a  test  would  be  constructed  by  choosing  a  random  sample  of  examples 
and  nonexamples  and  systematically  varying  cri t i caT  attr i butes  and 
variable  attributes.    An  example  of  a  concept  analysis  and  a  sample  item 
for  the  concept  ''antonym''  is  given  in  Figure  5  from  Tiemann  and  Markle 


Markle  and  Tiemann  recently  extended  their  work  to  multiple 
coordinate  concepts  (Tiemann,  Kroeker  &  Markle,  Note  11).    An  example  of 
coordinate  concepts  is  the  four  behavioral  concepts  of  positive  and  nega- 
tive reinforcement  and  positive  and  negative  punishment.     In  this  case, 
the  four  concepts  interrelate  to  the  point  where  an  example  of  one  con- 
cept is  a  nonexample  of  the  other  concept*    Students  need  to  learn  to 
reject  a  nonexample  in  one  case  but  accept  i t  as  an  example  in  the  other 
case.     The  Tiemann,  et  al.,  study  (Note  11)  provides  an  example  of  how  a 
domain-based  test  is  produced  for  a  set  of  four  coordinate  concepts  by 
systematically  sampling  examples  of  each  of  the  concepts  and  varying 
them  on  their  attribute  dimensions. 

Like  other  emerging  technologies  of  test-item  writing  and  domain 
specification,  there  is  little  empirical  work  to  support  the  approach 
being  advocated.     The  concept-based  approach  of  Markle  and  Tiemann  pro- 
vides a  level  of  cognitive  questioning  that  goes  beyond  levels  typically 
assessed  by  the  linguistic-based  approach.    And  concept-based  testing 
also  serves  to  capture  areas  of  instructional  intent  that  are  not  handled 
very  well  by  item  forms  which  seem  most  applicable  to  discrete  objects 
such  as  those  found  in  mathematics,  science,  and  basic  skills  (e.g., 
spel 1 ing) . 


Compute r-Based  Methods 

Computers  have  been  used  for  many  years  as  aids  in  assembling  or 
administering  tests  (e.g.,  Atkinson  6  Wilson,  1969).     Early  attempts 
centered  on  the  use  of  item  banks  containing  all  of  the  actual  items  from 
which  samples  were  drawn  for  testing.    More  sophisticated  systems 
included  the  composition  of  items  such  as  was  done  in  the  mid-1960*s  in 
the  dri 1 1-and-practice  exercises  of  the  Stanford  computer-assisted  in- 
struction project  (Suppes,  Jerman  6  Groen,  1966).     Computer  orograms  with 
the  capability  of  generating  items  can  be  used  to  create  domain-based 
tests,  and,  for  that  reason,  they  will  be  described  in  more  detail. 

The  major  author  languages  used  in  computer-assisted  instruction 
have  the  capability  of  producing  algorithms  for  domain-based  test  items. 
Several  of  the  CAI  languages  discussed  by  Roid  (197^)  such  as  COURSE- 
WRITER,  PLANIT  and  TUTOR  have  functions  which  allow  an  item  form  to  be 
programmed  as  described  by  Millman  and  Outlaw  (1977)-     For  example. 


(1978). 


Grammar  Concept:  Antonym* 
A  word  which: 


Critical  Attributes 

has.  a  meaning  opposite  to  the  meaning  of  some  other  (given)  word 

is  the  same  part  of  speech  as  the  given  word 

is  a  new  word,  not  a  variation  of  the  given  word 

Variable  Attributes 

may  be  drawn  from  various  parts  of  speech: 

a)  nouns  c)  pronouns  e)  ad ject i ves 

b)  verbs  d)  adverbs  f)  prepositions 


5- 

relative  syllabic  length  of  two  words 

may  be: 

a)  equal 

b)  unequal 

6. 

opposition  of  meaning  may  exist: 

a)  across  some  continuum 

b)  in  a  dichotomous 

sense 

Teaching  Examples 

Teaching  Nonexampi 

es 

1. 

bad;  good 

Ae,5a,6a 

1. 

vain;  greedy 

1  acks 

on  ly 

1 

2. 

danger;  safety 

Aa, 5a>6a 

2. 

reason;  motive 

lacks 

on  ly 

1 

3. 

1 i  ve ;  die 

Ab,5a.6b 

3. 

we;  us 

lacks 

only 

1 

he;  she 

^c,5a,6b 

above;  upon 

lacks 

only 

1 

5. 

rapidly;  slowly 

ifd.^b.&a 

5. 

merr 1 ly ;  sad 

lacks 

only 

2 

6. 

in;  out 

i4f,5a,6b 

6. 

happy;  unhappy 

lacks 

on  ly 

3 

7. 

capable;  Incapable 

lacks 

only 

3 

8. 

disputable;  agree 

lacks 

only 

2 

Testing  Examples 

Testing  Nonexamples 

1. 

hot;  CO 1 d 

^e,5a,6a 

I . 

imaginary;  fanciful 

1  acks 

on  ly 

1 

2. 

loss;  gain 

^a ,  5a  ,  6a 

2. 

chai  r;  couch 

1  acks 

only 

1 

3. 

elevate;  lower 

Ab,5b,6a 

3. 

behind;  next  to 

lacks 

on  ly 

1 

you;  me 

'♦c,5a,6b 

k. 

gloom;  bright 

lacks 

on  ly 

2 

5. 

gaily;  sadly 

5. 

violent ;  non-violent 

lacks 

on  ly 

3 

6. 

over;  under 

i4f,5a,6b 

6. 

val i  d;  i nval i  d 

lacks 

only 

3 

7. 

weak;  forcibly 

lacks 

only 

2 

Sample  Test  I tem 

Which  of  the  following  pairs  of  words  are  antonyms? 


a.  Imaginary  --  fanciful 

"b.  elevate  lower 

c.  val id  —  inval id 

d.  weak  forcibly 


Correct  Answer: 


Figure  5.    Example  of  a  concept  analysis  used  to  develop  domain- referenced 
tests  of  concept  learning. 

*Adapted  from  P.  W.  TIemann  and  Susan  M.  Markle.  Analyzing  Instructional  Content:  A  Guide 
to  Instruction  and  Evaluation.  Champaign,  IL.:  Stipes  Publ . ,  1978,  p.  257-  By  permission 
of  the  publisher  and  authors. 
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Atkinson  (Atkinson  S  Wilson,  1969,  p.  153)  used  COURSEWRITER  to  create 
reading  exercises  and  criterion  tests  for  the  Stanford  reading  programs, 

A  sample  exercise  is  the  sentence,  "Jan  saw  the    hat,"  for  which 

the  student  is  to  choose  one  of  a  set  of  computer-assembled  words,  such 
as  "tan,"  "fat,"  "man"  or  "run."    The  fill-in  answers  are  selected  by 
rules  from  words  previously  presented  in  lessons. 

Another  example  is  the  work  of  Fremer  and  Anastasio  (1969)  who  used 
computers  to  help  generate  items  for  testing  spelling.    They  conducted 
an  analysis  of  types  of  misspellings  used  by  writers  of  spelling  items. 
A  5st  of  srrcr  generation  rules  were  developed  and  progrsmmed  for  a  com- 
puter.    Error  generation  rules  included  the  inversion  of  letters  within 
a  word,  omission  of  letters,  or  insertion  of  letters.    An  example  for 
the  word  "preferable"  would  be  "perferable"  or  "preforable"  or  "prefer- 
abal."    Fremer  and  Anastasio  found  that  computer-generated  lists  of 
spelling  items  were  judged  highly  useful  by  a  panel  of  spelling  test 
developers. 

Beginning  with  the  pioneering  work  of  Hively,  Patterson  and  Page 
(1968)  and  Osburn  (1968),  a  great  deal  of  work  has  been  done  on  domain- 
based  tests  in  mathematics.     For  example,  Hsu  and  Carlson  (1973)  devel- 
oped routines  used  to  construct  tests  for  the  elementary  mathematics 
level  of  the  Individually  Prescribed  Instruction  program.    They  used  the 
concept  of  item  forms  developed  by  Hively  in  programming  item-generation 
routines.    Hsu  and  Carlson  make  the  important  suggestion  that  statistics 
for  item  forms  be  computed  by  collecting  data  from  tryouts  of  each  item 
form.     Because  individual   test  items  are  automatically  produced,  the 
best  way  to  insure  the  quality  of  test  items  is  to  improve  the  quality 
of  the  item  forms.     By  field  testing  and  keeping  statistics  at  the  item 
form  level,   it  will  be  possible  to  develop  higher  quality  domain-based 
tests. 

Beginning  with  the  reported  work  of  Osburn  (1968),  a  number  of 
university  professors  have  developed  computer-generated  testing  systems, 
particularly  in  the  sciences.     For  example,  Johnson  (1973)  has  developed 
a  system  for  computer-generated  items  for  chemistry  at  the  college  level. 
Each  of  a  series  of  subroutines  defines  an  item  form.    These  item  forms 
include  numerical  constants  which  are  randomly  generated  by  computer  or 
variable  wordings  which  might  include  different  names  of  chemical  com- 
pounds.   An  example  of  an  item  form  and  an  individual   item  from  this 
system  is  given  in  Figure  5- 

Military  uses  of  computerized  item  writing  include  the  work  by 
Braby,  Parrish,  Gintard  and  Aagard   (Note  13)  at  the  Orlando  Naval  Train- 
ing Center.     They  developed  algorithms  for  teaching  and  testing  symbol- 
recogn  i  t  ion . 

Other  computerized  item-writing  efforts  have  been  described  by 
Olympia  (1975)  and  Vickers  (1973).     The  work  of  Vickers  is  interesting 
in  that  it  involves  the  computer  generation  of  items  useful   in  the 
teaching  of  FORTRAN  programming.     A  series  of  subroutines,  that  employ 
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Item  Form 

IF   ML.  OF    MOLAR    IS  MIXED  WITH   ML.  OF 

  MOLAR   ,  THE  FINAL  SOLUTION  WILL  BE: 

A)   MOLAR  IN   

B)    MOLAR  IN   

C)   MOLAR  IN   

D)   MOLAR  IN   

E)    MOLAR  IN   

Sample  Item 

IF  A3. 6  MLS.  OF  1.50  MOLAR  NAOH  IS  MIXED  WITH  38.5  MLS.  Cr 
1.1A  MOLAR  HN03,  THE  FINAL  SOLUTION  WILL  BE: 

A)  1.33  MOLAR  IN  OH  (-) 

B)  1.10  MOLAR  IN  H  (+) 

C)  0.260  MOLAR  IN  H  (+) 

D)  1.33  MOLAR  IN  H  (+) 

E)  0.260  MOLAR  IN  OH  (-) 


Figure  5.     Example  of  an  item  form  and  item  from  Johnson's 
Computer-Generated  Repeatable  Chemistry  Exam 
System  (Johnson,  1973)' 
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random  numbers,  are  used  to  compose  FORTRAN-like  statements.     Then,  the 
student  is  asked  to  discriminate  between  correct  and  incorrect  statements 
or  to  classify  types  of  variables.    This  is  an  excellent  example  of  a 
sophisticated  item-writing  method  that  produces  a  larqe  domain  of  items. 

In  summary,  there  is  a  wide  variety  of  applications  of  computerized 
item-writing  methods.    Many  of  these  methods  are  in  use  in  military, 
college,  and  university  courses,  particularly  in  the  sciences.    The  capa- 
bility to  implement  these  methods  is  available  at  most  of  the  major 
computer  centers  in  the  nation.     Thus,  the  technology  is  available  for 
writlny  doiTia  i  n-based  cr  i  ter  ion^  referenced  tests.    The  chsiiGngs  tuat 
remains  is  in  the  specification  of  domains  and  the  definition  of  item- 
writing  algorithms  in  a  wide  variety  of  subject-matter  areas.  Also, 
more  creative  efforts  are  required  to  develop  domains  at  the  conceptual 
and  higher  levels  of  learning  following  the  recommendations  of  Tiemann  and 
Markle. 


CONCLUSIONS  AND  RECOMMENDATIONS 


Instructional  development  has  been  capably  served  by  several  prin- 
ciples of  learning  and  testing  which  involve  the  use  of  instructional 
objectives  and  other  testing  aids.     Research  on  variables  within  system- 
atic instruction  (reviewed  by  Block,  1971;  Ouchastel  &  Merrill,  1973; 
Hartley  &  Davies,  4976;  Melton,  1978;  Robin,  1976)  has  been  impress ive. 
Both  systematic  instruction  and  the  use  of  instructional'  objectives^ 
appear  to  have  positive  effects  on  learning.    Unfortunately,  objectives 
permit  much  too  much  freedom  to  the  often  inexpert  item  writer  which  in 
turn  results  in  many  items  which  are  instructional ly  irrelevant  or 
psychometrical ly  unsound.     Despite  logical  and  empirical  methods  of  item 
review,  many  of  the  problems  in  producing  items  may  be  avoided  by 
employing  one  of  several  domain-based  item-generating  approaches.  The 
five  reviewed  here  are  sound  in  theory,   research  and  development.  Pre- 
liminary findings  indicate  a  vase  potential   for  the  creation  of  large 
groups  of  items  which  mov  form  the  basis  of  sound  CR  testing  in  the 
future. 

The  key  to  all  this  activity  is  the  acceptance  of  the  process  in  CR 
test  development  illustrated  in  Picture  i.     Test  theorists  and  practi- 
tioners are  in  accord  when  they  ma  ntain  a  concern  for  a  logical  and 
close  correspondence  between  instruction  and  testing.     Because  domain- 
based  methods  elaborate  on  object Ive-based  methods,   it  is  possible  to 
achieve  almost  perfect  objectivity  in  the  creation  of  test  items. 
Therefore,  the  item-writing  methods  being  reviewed  promise  a  more 
scientific  approach  to  item  development,  which  in  turn  improves  the 
measuring  of  student  achievement.    This  improvement  should,  in  turn, 
help  instruction  to  more  closely  and  accurately  monitor  student  progress 
while  educational   researchers  should  find  the  work  of  creating  achieve- 
ment tests  as  research  tools  more  fruitful. 
« 
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AIRCREW  TRAINING  RESEARCH  -  PROJECT  ACTIVE 


by 

Captain  Wayne  E.  Keates 
Staff  Officer  Analysis 
Air  Command  Headquarters 
Westwin,  Manitoba 
Canada  R2R  OTO 


The  views  and  opinions  expressed  are  those  of  the  author 
and  not  necessarily  those  of  the  Department  of  National 
Defence. 


A  paper  prepared  for  the  20th  Annual  Conference  of  the 
Military  Testing  Association  held  at  Oklahoma  City, 
Oklahoma,  USA  during  the  week  of  30  Oct  -  3  Nov  1978. 
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In  early  1976,  as  part  of  the  on-going  development  of  the  new 
Air  Counand  in  the  Canadian  Forces,  an  initial  analysis  of  the  pilot 
training  system  was  carried  out*    It  quickly  became  evident  that  the 
training  data  available  was  fragmented  and  that  training  units  were 
not  in  a  position  to  collate  and  analyze  the  data  required  to  monitor 
the  overall  training  process*    Since  Air  Command  had  been  tasked  with 
the  responsibility  for  all  air-related  training,  it  was  proposed  that 
a  centralized  information  system  be  established*    This  system  would 
initially  include  only  pilot  and  navigator  training;  but  afrer  suffi- 
cient time  had  passed  to  evaluate  the  system,  consideration  would  be 
given  to  establishing  a  similar  monitoring  mechanism  for  all  air- 
related  training* 

This  centralized  j,nformation  system  was  named  the  Air  Commai?.d 
Training  Information  System  for  Validation/Evaluation  (Project  ACTIVE)* 
Basically  it  is  a  longitudinal  data  collection  process  in  which  Air 
Command  training  units  forward  relevant  information  to  a  central  office 
in  Air  Command  Headquarters  where  it  can  be  used  for  the  management 
of  training*    In  addition,  it  is  proposed  that  much  of  the  data  will 
be  fed  back  to  the  complete  training  system,  after  some  reduction  and 
analysis*    In  this  way,  training  units  will  have  access  to  a  consi- 
derable amount  of  data  which  had  earlier  been  unavailable  to  them  and 
will  be  better  able  to  consider  their  own  part  within  the  total  train- 
ing process* 

At  all  stages  of  training  we  collect  three  types  of  information* 

1*    The  first  type  may  be  considered  biographical*    It  is  used 
to  identify  and  describe  the  student*    It  includes  informa- 
tion such  as  age,  entry  plan,  previous  flying  experience, 
etc* 

2*    The  second  type  is  performance  information*    This  merely 
records  how  well  the  student  has  performed  on  each  course* 
At  more  advanced  training  levels  this  will  also  include  an 
assessment  of  how  well  prepared  the  student  was  for  the 
current  course. 

3*    The  third  type  is  attitudinal  information*    For  this,  a 

43-item  training  satisfaction  questionnaire  is  administered 
at  each  stage  of  training*    Some  of  the  more  biographical 
data  is  also  related  to  attitudinal  variables,  such  as 
student's  future  employment  preferences  and  military 
assessment  (adaptation  to  military  life)* 

As  was  mentioned,  both  pilot  and  navigator  training  are  included 
in  Project  ACTIVE*  A  brief  look  at  each  should  make  the  data  flow  more 
obvious* 
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Figure  1  depicts  pilot  progression  through  training  to  employment. 
All  pilot  trainees  complete  Basic  Officer  Training,  Primary  Flying 
Training  and  Basic  Flying  Training.     Students  for  Basic  Rotary  Wing 
Training  are  streamed  from  Basic  Flying  Training  after  hour  140,  the 
remainder  continue  to  200  hours  and  are  sent  to  either  high  perfor- 
mance or  multi-engine,  with  a  small  number  being  retained  in  training 
employment. 

Figure  2  depicts  the  progression  of  one  student  through  train- 
ing, showing  the  information  collected  at  each  stage.    Since  the 
student  is  not  under  Air  Command  control  during  his  Basic  Officer 
Training,  detailed  data  collection  is  not  possible  at  that  stage. 
However,  his  course  grade  is  picked  up  later  from  his  training  records. 

Primary  Flying  Training  is  a  27-hour  course  (about  7  weeks  in- 
cluding ground  s-.t-^ooI)  on  the  CT134  Musketeer  held  at  Canadian  Forces 
Base  Portage  La  Prairie.    At  this  stage  we  collect  biographical  infor- 
mation, grade,  performance  ratings,  academic  marks,  officer  development 
ratings,  military  assessment  and  attitude  qr.es tionnaire  results.  The 
primary  reporting  form  is  the  modified  CF377,  shown  as  Figure  Al  in 
Annex  jf..    The  military  assessment  is  a  5-point  rating  of  the  student's 
i^C    K-^tion  to  military  lif^i.    The  attitude  questionnaire  is  administered 
5<h<  r^.^.y  after  the  student's  first  solo.    It  should  be  mentioned  that 
the  attitude  questionnaire,  at  each  stage,  is  administered  by  the 
Base  Personnel  Selection  Officer  and  forwarded  directly  to  Air  Command 
Headquarters.    At  no  time  ar^:;  individual  results  made  available  to 
any  training  staff. 

Continuing  with  Figure  2,  one  can  see  that  the  student  then 
proceeds  to  the  Basic  Flying  Course  at  Canadian  Forces  Base  Moose  Jaw. 
This  is  a  200-hour  course  (about  11  months)  on  the  CT114  Tutor.  As 
was  mentioned  earlier,  rotary  wing  candidates  continue  to  their  next 
course  after  140  hours.    At  the  end  of  this  course  the  student  graduates 
with  his  pilot's  wings  and,  if  a  cadet,  receives  his  commission.  The 
primary  reporting  form,  included  as  Figure  A2,  includes  biographical 
information,  grade,  academic  average,  and  military  assessment.  The 
attitude  questionnaire  is  administered  twice,  just  after  solo  and  a 
few  months  before  the  end  of  the  course.     In  addition,  we  receive  the 
student's  progress  book,  which  includes  detailed  particulars  about 
each  flight  and  trainer  session. 

In  the  particular  example  shown,  the  student  then  proceeds  to 
the  Basic  Fighter  Course  at  Canadian  Forces  Base  Cold  Lake.  Three 
details  about  the  reporting  form  (Figure  A3)  should  be  mentioned.  One, 
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^    this  is  the  first  of  the  feedback  forms.    The  rating  categories  shown 
on  page  2  of  Figure  A3  consist  of  skills  the  student  should  have 
developed  on  his  previous  course.    The  staff  must  rate  the  student's 
ability  to  perform  these  groups  of  tasks  relevant  to  the  standard 
which  they  expect  of  an  incoming  student.    Two,  page  3  of  the  form 
provides  a  mechanism  whereby  the  staff  can  specify  the  particular 
tasks,  within  these  categories,  on  which  the  student  was  especially 
good  or  was  deficient.    The  task  card  for  these  categories  is  included 
as  Figure  A4.    Three,  on  page  4  the  training  unit  must  assess  the 
standard  of  performance  and  rate  of  progress  on  this  course.  The 
category  list  will  differ  for  each  unit  and  each  aircraft  type. 
These  categories  were  determined  by  the  schools  before  data  collection 
began  and  are  added  to  the  form  by  the  schools. 

Data  are  collected  from  the  Operational  Training  Units  (OTUs) 
on  a  similar  4-page  form.    In  each  case  the  categories  in  Part  I  (the 
feedback  categories)  are  the  relevant  categories  for  the  preceding 
course  and  the  Part  II  categories  are  the  ones  specified  by  that  OTU. 

We  have  not  begun  data  collection  from  the  operational  squadrons 
but  it  is  expected  that  the  form  will  be  similar  to  the  feedback 
portion  of  the  OTU  form. 

Data  collection  for  the  multi-engine  and  rotary  wing  streams 
^  are  similar  to  the  example  given. 

For  navigator  training  the  same  procedures  apply.  Navigator 
progression  is  shown  in  Figure  3  and  an  example  of  one  student's 
progression  is  shown  in  Figure  4.    The  student  begins  his  navigator 
training  at  the  Canadian  Forces  Air  Navigation  School  at  Canadian 
Forces  Base  Winnipeg.    At  the  end  of  this  course  he  receives  his 
wings  and  proceeds  to  one  of  the  Operational  Training  Units.  The 
reporting  forms  are  attached  as  Annex  B.    As  with  the  pilots,  the 
attitude  questionnaire  is  administered  at  each  stage  of  training. 

At  this  stage  of  its  development  Project  ACTIVE  has  not  provided 
sufficient  numbers  of  records  to  justify  statistical  analysis.  This 
situation  is  a  normal  condition  in  any  research  involving  longitudinal 
tracking.    The  large  number  of  possible  combinations  and  permutations 
of  the  results  are  obvious. 

A  few  more  general  points  about  the  project  should  be  mentioned 
before  closing.    Project  ACTIVE  will  allow  this  Headquarters  to  monitor 
pilot  and  navigator  training  as  a  total  process  involving  a  ntiraber  of 
distinct  stages  rather  than  as  a  series  of  discrete  training  courses. 
At  the  same  time,  this  does  not  preclude  the  option  of  also  examining 
specific  courses  in  isolation. 
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The  information  collected  will  be  summarized  and  reported,  not 
only  to  the  Headquarters  officers  responsible  for  training,  but  also 
to  all  of  the  schools.    Thus  the  schools  will  be  in  a  better  position 
to  participate  in  decision  making.    Attitudinal  information  will  be 
available  to  the  schools,  but  only  in  aggregate  form  to  maintain 
confidentiality  for  the  students.    The  informal  and  often  haphazard 
feedback  network  between  schools  will  be  strengthened  by  the  addition 
of  a  formal  and  structured  feedback  mechanism. 

The  specific  procedures  and  possible  outcomes  that  make  up 
Project  ACTIVE  are  not  new.    Most  of  the  procedures  have  been  used 
on  individual  courses  in  the  past.    What  Project  ACTIVE  does  promise 
is  the  opportunity  to  manage  and  monitor  the  longitudinal  training 
process  in  considerable  detail  to  ensure  that  our  training  is  the 
best  we  can  make  it. 
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PARTI  -  WINGS  VALIDATION 

PROJECT  ACTIVE  -  Form  CF/K  2673 

This  form  is  the  primary  validation  document  for  pilot  train.r^  in  the  Canadian  forces,  and  for 
members  of  other  participating  forces.  You  are  asked  to  comment  .Ti  :nc  graduate's  performance  in  your 
OTU  based  upon  the  tasks  accomplished  in  Basic  Flying  Trainin2 

INSTRUCTIONS  -  PT  I 

1.       You  must  assess  categories  of  tasks  as  shown  in  the  list  below.    Enter  rating  in  the  block  beside 
the  category,  (Omit  those  which  are  not  exercised  in  your  unit;  i.e.  Advanced  Manoeuvres  in  M/E  OTUs.) 

RATING  LEGEND 

1.       Graduate's  CATEGORY  or  TASK  performance  is:     1.  completely  unacceptable  for  this  unit; 

2.  sub-stand;vrd  tor  this  unit; 

3.  standard  frr  this  unit;  or 

4.  of  the  highest  crdei. 

a  CATEGORY  RATINGS 


CATEGORY 

RATING 

1 

2 

3 

4 

BA.    BASIC  FLYING  SKILLS 

BB.    INTERMEDIATE  MANOEUVRES 

BC.    ADVANCED  MANOEUVRES 

BD.    BASIC  INSTRUMENT  MANOEUVRES 

BE.            r.  PROCEDURES 

BF.  VOR 

BG.  RADAR 

BH.  ILS 

BJ.    IFR  Cf^!OSS.COUNTRY  PROCEDURES 

BK.    AiR  NAVIGATION 

BL.    NIGHT  FLYING  (DUALS) 

BM.    BASIC  FORMATION  MANOEUVRES 

BN.    INTERMEDIATE  FORMATION  MANOEUVRES 

BP.    ADV'.^'rED  Fv^RMATION  MANOEUVRES 

b.  ;^OURS  FLOWN  AT  COURSE  ENTRY 
(NON-PIPELINE  ONLY) 


WE 

HI  PERF 

TOTAL 

C.  WAITING  TIME-  MOOSE  JAW  to  OTU  (if  applicable) 


MONTHS 
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d.  TASK  RATINGS 

You  should  assess  specific  tasks  if  possible.  Rate  only  those  tasks  that  are  below  or  above 
scaitdaril.  (1,  2  or  4.)  Enter  task  number  and  rating  in  columns  below.  See  Task  Card  for  specific 
tasks  and  numbers. 


I 


TASK 

GRADE 



ASSESSED  BY 


SIN 


RANK 


TASK 

TASK 

GRADE 

TASK 

GRADE 

— ' 

SIGNATURE  POSITION 

1 

ERIC 
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PART  II  -  OTU  COURSE  ASSESSMENT 

This  is  a  record  of  the  student's  accomplishments  during  this  course.  The  assessment  categories 
ai.  rhe  major  phases  of  your  course  (eg  Weapon  Delivery,  etc,  etc)  or  the  actual  objectives  in  the  CTS, 
Two  assessments  per  catcp-oiy  are  required.   Firstly  progress  or  *speed  of  learning*  and  secondly,  the 
standard  assessment  as  Jicr  Rating  Legend  on  page  2. 


INSTRUCTIONS 


RATINGS  -  PROGRESS  ASSESSMENT 


1. 

Enter  all  categories  in  spaces  below. 

u 

Unacceptable  progress 

2. 

Check  appropriate  column  under  Progress  and  Standards. 

2. 

Slow  Progress 

3. 

Check  appropriate  blocks  in  overall  assessment. 

3. 

Advanced  as  planned 

4. 

Complete  Military  Assessment. 

Superior  Progress 

5. 

Ensure  that  Attitude  Questionnaire  is  completed. 

r  A  TE  (Vl  R  Y /n  R 1 P  r  Tl  V  E 

PROGRESS 

STANDARD 

1 

2 

3 

4 

1 

2 

3 

4 

b.  OVERALL  ASSESSMENT 

PROGRESS  RATING    Q]  H]  H]  S 

STANDARDS  RATING  [T]  |T|  H 


c.  MILITARY  ASSESSMENT 

HOW  WELL  HAS  THIS  OFFICER 
AOAPTEO  TO  MILITARY  LIFE? 
NOT  VERY 
WELL  WELL 

m   s   s   H  (n 


d.  DATE  OTU  CuMi>LETED 


-I  i  r" 

■     I  . 


CHECK  ONE  -  PIPELINE  STUDENTS  ONLY 
FLYING  TIME  ON  COURSE 


e.  ADDITIONAL  COMMENTS  BY  SON  COMD 

Additional  remarks  including  specific  reasons  for  CT,  comments  on  attitude  or  officer  development 
and  opinion  of  entry  standard,  previous  coursing,  etc. may  be  appended  on  separate  sheet. 
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Figure  A 4 

P^OjECL^CuVEni  TASK/CATEGORY  CARD  1 


(BASIC  FIXED  WING  -  USE  WITH  FORM  CF/K  !3«Ml  PART  1) 


-  BASIC  FLTING  SKILLS 


BP  "  BASIC  INSTRUMENT  MANOEUVRES 


B 

001 

GROUND  HANDLING 

B 

060 

STRAIGHT-AND-LEVEL  FLIGHT 

B 

002 

TAKE  OFF 

B 

061 

CHANGING  AIRSPEED 

B 

003 

CLIMB 

B 

062 

GENTLE  TURNS 

B 

OOA 

STRAIGHT  &  LEVEL  FLT 

B 

063 

MEDIUTi  TURNS 

B 

005 

CHANGING  AIRSPEED 

£ 

064 

CLIMBS 

B 

006 

GENTLE  TURNS 

B 

065 

DESCENTS 

B 

007 

MEDIUM  TURNS 

B 

C66 

LEVEL-OFF 

B 

009 

DESCENTS 

h 

067 

TURNS  TO  HE/iDING 

B 

010 

LEVEL-OFF 

B 

068 

RATED  CLIMB 

B 

016 

TRAFFIC  PATTERNS 

B 

069 

RATED  DESCENT 

B 

017 

CIRCUIT  (normal) 

B 

070 

STANDARD-RATE  TURNS 

B 

018 

OVERSHOOT 

B 

071 

STEEP  TURNS 

B 

019 

LANDING  (basic) 

B 

072 

UNUSUAL  ATTITUDE 

B 

073 

TIMED  TURNS 

BB  -  INTERMEDIATE  MANOEUVRES 

B 

078 

UHF  HOMING 

B 

081 

CLEARANCES  (LC  *« 

B 

020 

STRAIGHT-IN  LANDING 

B 

091 

TIMED  TURN  (NO  LiiL*i) 

B 

008 

STEEP  TURNS 

B 

092 

UNUSUAL  ATTITUDES 

B 

Oil 

SLOW  FLYING 

B 

012 

LANDING  ATTITia)E  STALL 

BE  -  TACAN  PROCEDURI&S 

B 

013 

FINAL-TURN  STALLS 

B 

014 

HIGH-SPEED  STALLS 

B 

074 

TACAN  INTERCEPTION 

B 

015 

UNUSUAL  ATTITUDES 

B 

075 

TACAN  TRACKING 

B 

021 

CLOSED  PATTERNS 

B 

076 

TACAN  DEPARTURE 

B 

022 

FORCED  LANDING 

B 

077 

TACAN  ARCING 

B 

023 

FLAPLESS  LANDING 

B 

079 

TACAN  POINT-TO-POINT 

B 

024 

FORCFD  LANDING  FROM  IP 

B 

080 

TACAN  HOLDING 

B 

025 

3?IhS 

B 

082 

TACAN  APPROACH 

B 

026 

RANDOM  RADAR 

B 

083 

TAT.AN  MISSED  APPROACH 

B 

027 

MINIMUM-ROLL  LANI^ING 

B 

084 

rilN  FUEL  TACAN  APPROACH 

B 

028 

SQUARE  CIRCUIT 

B 

030 

SLOW  ROLL 

BF  -  RADAR 

B 

031 

LOOP  (BELOW  30,000') 

B 

032 

MAXJMUM-RATE  TURNS 

B 

086 

RANDOM  RADAR 

B 

033 

CLOVER  LEAF 

B 

085 

LANDING  FROM  RADAR 

B 

034 

CUBAN  EIGHT 

B 

087 

RADAR  FINAL 

B 

037 

BiVRREr.  ROLL 

B 

088 

RADAR  SQUARE  PATTERN 

B 

038 

FOUR-POINT  ROLL 

B 

089 

RADAR  MISSED  APPROACH 

B 

039 

HESITATION  ROLL 

B 

090 

NO-COMPASS  RADAR 

B 

040 

ROLL-OFF-THE-TOP 

B 

041 

HALF-ROLL  &  PUL:  -THRU 

BG  -  VOR 

B 

046 

EMERGENCY  DESCENT 

B 

049 

ROLL-IN  AND  ROLL-OL' 

B 

093 

VOR  INTERCEPTION 

B 

094 

VOR  TRACKING 

BG 

-  ADVANCED  MANOEUVRES 

B 

095 

VOR  HOLDING 

B 

096 

VOR  APPROACH 

B 

029 

UNUSUAL  ATTITUDE  SPIN 

B 

007 

LANDING  FROM  VOR  APPROACH 

B 

035 

OFF-SPEED  AEROBATICS 

B 

098 

VOR  MISSED  APPROACH 

B 

036 

LOW  FLYING 

B 

042 

MULTIPLE  AEROBATICS 

BH  -  ILS 

B 

043 

MACH  RUN 

B 

044 

LOOP  (ABOVE  30,000* ) 

B 

099 

RADAR  VECTORED  ILS 

B 

045 

ROLL  (above  30,000») 

B 

100 

ILS  BACK  CRSE  FINAL 

B 

047 

VERTICAL  EIGHT 

B 

101 

ILS  FRONT  ORSE  FINAL 

B 

048 

VERTICAL  ROLL 

B 

102 

LANDING  FROM  ILS  APPROACH 

B 

103 

ILS  MISSED  APPROACH 

B 

104 

TACAN/ ILS  APPROACH 

SEE  OVER 


If 
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BJ  -  ITR  X'-CTRY 


B 

105 

PRE-FLIGHT  PLANNING  (X-CTRY) 

B 

106 

IFR  CLEARANCES 

B 

107 

DEPARTURE 

B 

108 

£NROUTE 

B 

109 

TACAN  APPROACH 

B 

llO 

ENROUTE  RADAR  DESCENT 

B 

III 

RADAR  FINAL  APPROACH 

B 

112 

VOR  APPROACH 

B 

113 

VOR/ILS  APPROACH 

B 

114 

TACAN/ ILS  APPROACH 

B 

115 

TACAN/RADAR  PICK-OFF 

B 

116 

RADAR  VECTORED  ILS  APPROACH 

B 

117 

ILS  BACK  CRSE  FINAL  APPROACH 

B 

118 

ILS  FRONT  CRSE  FINAL  APPROACH 

B 

119 

LANDING  FROM  INSTRUMENT  APPROACH 

B 

120 

INSTRUMENT  MISSED  APPROACH 

B 

121 

VISUAL  APPROACH  AND  LANDING 

BL  -  NIGHT  FLYING 


B 

140 

GROUND  HANDLING 

B 

141 

TAKE-OFF 

B 

142 

CLIMB 

B 

143 

UNUSUAL  ATTITUDE  RECOVERY 

B 

144 

TRAFFIC  PATTERN 

B 

145 

CIRCUIT 

B 

146 

LANDING 

B 

147 

OVERSHOOT 

B 

148 

TACAN  APPROACH 

B 

149 

I^DING  FROM  INSTRUMENT  APPROACH 

B 

150 

RAjDAR  APPROACH 

B 

151 

MISSED  APPROACH 

BM  -  BASIC  KORMATION  MANOEUVRES 

B 

160 

GROUND  ."^fNDLlNG 

B 

161 

STATION  KEEPING  (TO  45*^) 

B 

162 

CHANGING  STATION 

B 

163 

WING  TAKE-OFF 

B 

164 

SELECTION  OF  ANCILLARIES 

B 

165 

REJOIN 

B 

166 

TRAIL 

B 

16  7 

FLAT  TURNS  (4  PLANE) 

B 

168 

GROUND  HANDLING  (4  PLANE) 

B 

169 

STATION  KEEPING  (4  PLANE) 

B 

170 

CHANGING  STATION  (4  PLANE) 

BK  ^  AIR  NAVIGATION 


B  130  £-ri?%PARATION  AND  FLIGHT  PLANrllNG 

B  131  MEDIUH-LEVFI  NAVIGATION  PRCCLDURES 

B  132  MAP  READING 

B  133  SET  HEAIING  PROCEDURES 

B  134  LOG  KEEPING  AND  ENTRIES 

B  135  PILOT  ABILITY  AND  AIRMANSHIP 

,  B  136  TAKE-OFF  AND  DEPARTURE 

B  13  7  MEDIUH-LEVEL  NAVIGATION  PROCEDITRES  (MDR) 

B  138  LOW-LEVEL  NAVIGATION  PROCEDURES  (BASIC) 


BN  -  INTERMEDIATE  FORMATION  MANOEUVRES 


B 

171 

WING  LET-DOWN 

B 

172 

TRAFFIC  PATTERN 

B 

173 

CIRCUIT 

B 

174 

LANDING 

B 

175 

STATION  KEEPING  (OVLR  45°) 

B 

176 

MISSED  APPROACH 

B 

177 

II;TERVAL  TAKE-OFr   (4  PLANE) 

B 

178 

JOIN-UP  (4  PLANE) 

B 

179 

TRAFFIC  PATTERN  (4  PLANE) 

B 

180 

CIRCUIT  (4  PLANE) 

B 

181 

LANDING  (4  PLANE) 

BP  -  ADVANCED  FORMATION  MANOEUVRES 

B 

182 

LEADING 

B 

183 

INSTRUMENT  APPROACH 

B 

184 

FORMATION  LANDING 

J  08  5 


J  .I  t 


Figure  Bl 
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ANNEX  B 


AIR  COMMAIMD 

TRAINIh4G  INFORMATION^  VALIDATION/EVALUATION  REPORT 

INSTRUCTIONS  -  Complete  for  each  graduate  or  for  students  who  fail  ot  withdraw.  Fill  in  Item  1  then  check  appropriate  blocks. 


1 


COURSK  NO. 


COT  2  LT     LT    CAPT    MAJ  LCOL 


2a. 


PLAN, 


ROTP 

CMC  CIV 

□  □ 


DEO 

□ 


OCTP  CROSS  RES 

MIL     CIV      TRAINEE  tf  rtfcO 


□  □ 


□  □ 


FNAT 1    FNAT  2 
□  □ 


2L 


AGE 


2c. 


LANGUAGE      ^NGLO  □      FRANCO  □     OTHER  (A)  □     OTHER  (FID        PROFILE  FT  I   I  I 


BOTC  GRADE 

SEE 


WAITING  TIMES 

PARU  TO  BOTC 
BOTC  TO  CFANS 


UNDER  2  MOS     2-6  MOS   OVER  6  MOS    ^„  NOT 

APPLICABLE 


□ 
□ 


□ 
□ 


□ 
□ 


□ 
□ 


4a. 


CFANS  RESULTS 

PASS 

0      [B]  [C 


FAIL     RECOURSE  w^'ithdrawal 


□ 


□ 


□ 


4bL 


FAILURE  REASON 

□  □ 


OKPICTENT  CONOUCT  OR 

FLYINO  SKILLS      MSOTCAL        ACADEMIC        OPPR  OEV 


□ 


PROGRESS        YES  NO  I 

SATISFACTORY     □  □  I 
TO  VW 

_  YES  NO 

HAS  POTENTIAL    n  H 

TO  GRADUATE     ^-J  L_J 


DISPOSITION  DATE 

r :  I  :  I  :  ' 

Y     Y   M  M    0  0 


RECOURSE  REASON 

PLYING        MEOICAL    ACAOEMIC    LANOUAOE  OTHER 

□         u         □        □  □ 


VOLUNTARY  WITHDRAWAL    g  JHIS  SECT/ON  FOR  PSO  USE  ONLY 

30B»N  T  LIK«     (  1                            FCCLS           |  1  P««  LS  PWOOROS    |       |  j 

lILITAHy    LirK     I       I                      INAOeOUATC     I       I  UN»  ATlSr  A  C  T  OR  V    |_|  | 

°°"viLc"*'    O                          LANaUAO«     Q  FINANCIAL  Q 

TO  ANOTHEPI  CLA55     M  ^       LJ  J 


FAMILY  nCASONS 



STUDENT  POSTED  TO 

VP 

□ 


STUDENT'S  CHOICE 


OTHER 


HS  CF101  TRANS 
□  □  □ 


1 

□ 


2 

2  a 

□ 


NO  PREFERENCE 


FLYING  HOURS  AT  CFANS 


ACADEMIC  PERFORMANCE 
%  MARK 


□□□ 


STAND^.NG 

TOP  □ 
MIOOLS  [jj 
BOTTOM  □ 


MILITARY  ASSESSMENT 
HOW  WELL  HAS  THIS  OFFICER 
ADAPTED  TO  MILITARY  LIFE? 
HOT  VERY 
WELL  WELL 

m   s  m   B  a 

CHECK  ONE  -  PIPELIf.    STUDENTS  ONLY 


CF/K  2676  (MAR  78) 


FORWARD  TO  -  SO  ANALYSIS 

ii«COM,  CFB  WINNIPEG 


P?OjBCv-ACkiViH» 
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PART  II  .  CATEGORY  RATING 

PROJECT  ACTIVE  -  FORM  2676 

This  form  is  the  priraary  evaluation  document  for  Navigator  training  in  the  Canadian  Forces,  and 
for  members  of  other  participating  forces.  You  are  asked  to  comment  on  the  graduate's  performance 
at  CFANS  based  upon  the  tasks  accomplished  in  Basic  Navigator  Training. 

INSTRUCTIOjC,  -  PT  II 

1.    You  must  assess  categories  of  tasks  as  shoTx,!)  in  the  list  below.   Enter  rating  in  the  block 
beside  the  category. 


RATING  LEGEND 

1.      Graduate's  CATEGORY  performance  is:  1 .  Unsatisfactory-  FAILED 

2.  Achieved  minimum  rating 

3.  Achieved  average  rating 

4.  Achieved  good  rating 

a.  CATEGORY  RATINGS 


CATEGORY 

RATING 

1 

2 

3 

4 

A.    MAPS,  CHARTS  &  FLIGHT  DOCUMENTS 

B.    FLIGHT  PLANNING 

C.  PRE-FLIGHT 

D.    FUEL  MONITORING 

E.    ELECTRONIC   FIXING  AIDS 

F.    CELESTIAL  FIXING 

G.    MPP  PROCEDURES 

H.    POSITION   COMPUTER  USE 

J.     GRID  NAVIGATION 

K.    GYRO  NAVIGATION 

L.    POST  FLIGHT 

M.  COMMUNICATIONS 

N.  AIRMANSHIP 

P.    AIR  REGULATIONS 

Q.    TASK  CO-ORDINATION 

R.    ACADEMIC  EFFORT 

CONFIDENTIAL  (WHEN  COMPLETED) 
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Figure  B2 
COHFiDEHTIAL  (WHEN  COMPLETED) 

AIR  COMMAND 


ANNEX  B 


Validaticn/Evaluaticn 

l^epcrt 

CTU$ 


Part  I.  Wings  Validation 
Part  II.  OTU  Course  Report 


GENERAL  INSTRUCTIONS 

This  is  the  OTU  volidotion  and  course  result  form*  Part  I  (Volidation)  is  to  be  completed 
when  studont  has  demonstroted  sufficient  performonce  to  assess.  Part  il  (OTU  Course  Assess- 
ment) is  to  be  done  when  student  hos  completed  the  course  or  is  CT  d. 


SIN 

ikAKiK 

NAME 

COURSfc  SERIAL 

A/»:  TYPE 

UNIT 

UlC 

CF/K  2677  MAR  78 


FORWARD  TO  -  SO  ANALYSIS 

4;RC0M.  CFB  WINNIPEG 
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PART  I  -  WINGS  VALIDATION 

PROJECT  ACTIVE  -  Form  CF/K  2677 

This  fonn  is  the  pciraary  validation  document  for  Navigator  training  in  the  Canadian  Forces,  and 
t    leaibers  of  other  participating  forces.  You  are  asked  to  comment  on  the  graduate's  performance 
in  your  OTU  based  upon  the  tasks  accomplished  in  Basic  Navigator  Training. 

INSTRUCTIONS  -  PT  I 

1.  You  must  assess  categories  of  tasks  as  shown  in  the  list  below.  Enrer  rating  in  the  block  beside 
the  category. 


RATING  LEGEND 

1.       CFANS  graduate's  CATl-GORV  or  1ASK  pcrtormancc  is:  1.  completely  unacceptable  for  this  Unit/Sqn 

2.  sub-standard  for  this  Unit/Sqr. 

3.  srandard  for  this  Unit/Sqn 

4.  of  the  hij^hest  order. 

a.  CATEGORY  RATINGS 


CATEGORY 

RATING 

1 

2 

3 

4 

A.    MAPS,  CHARTS  &  FLIGHT  DOCUMENTS 

B.    FLIGHT  PLANNING 

C  PRE-FLIGHT 

D.    FUEL  MONITORING 

E.    ELECTRONIC   FiXtNG  AIDS 

F.    CELESTIAL  FIXING 

G.    MPP  PROCEDURES 

H,     POSITION   COMPUTER  USE 

J.     GRID  HAVIGATrON 

K.    GYRO  NAVIGATION 

L.    POST  FLIGHT 

M.  COMMUNICATIONS 

H.  AIRMANSHIP 

P.    AIR  REGULATIONS 

Q.    TASK  CO-ORDINATION 

R.    ACADEMIC  EFFORT 

b.  HOURS  FLOWN  AT  COURSE  ENTRY 
(N0N*PIPELINE  ONLY) 


HS 

Al 

VP 

TPT 

TOTAL 

ERIC 


C.  HATE  PART  1  COMPLETED 


f 

— f  — 

1 

— " — 
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PART  11  -  OTU  COURSE  ASSESSMENT 

This  is  a  record  of  the  student's  accomplishments  during  this  course.  The  assessment  categories 
are  the  major  phases  of  your  course  Itg  Weapon  Delivery,  etc,  e^c)  or  the  actual  objectives  in  the  CTS, 
Two  assessments  per  category  Are  required.  Firstly  progress  or  'speed  of  learning'  and  secondly,  the 
standard  assessment  as  per  Rating  Legend  on  p:ige  2. 


INSTRUCTIONS 


RATINGS  -  PROGRESS  ASSESSMENT 


1. 

Enter  all  categories  in  spaces  below. 

I. 

Unacceptable  progress 

2. 

Check  appropriate  column  under  Progress  and  Standards. 

2. 

Slow  Progress 

3, 

Check  appropriate  blocks  in  overall  assessment. 

3. 

Advanced  as  planned 

4. 

Complete  Military  Assessment. 

4. 

Superior  Progress 

5. 

Ensure  that  Attitude  Questionnaire  is  completed. 

CATEGORY/OBJECTIVE 

PROCRESS 

STANDARD 

1 

2 

3 

4 

1 

2 

3 

4 

b.  OVERALL  ASSESSMENT 

PROGRESS  RATING    Q]  |T|  [I]  |T| 

STAMDAROS  RATING  []]  [T]  H]  [T] 
d.  DATE  OTU  COMPLETED 


c.  MILITARY  ASSESSMENT 


HOW  WELL  HAS  THIS  OFFICER 
ADAPTED  TO  MILITARY  LIFE? 

NOT  Vfc;<Y 

WELL  WELL 

m   s   a   s  [u 

CHECK  ONE  -  PIPELINE  STUDENTS  ONLY 


¥ 

1 

;  1 

1 

i 

FLYING  TIME  ON  COURSE  |     |     |  | 


e.  ADDITIONAL  COMMENTS  BY  SON  COMD 

Additional  remarks  including  specific  reasons  for  CT,  comments  on  attitude  c:  officer  development 
and  opinion  of  entry  standard,  previous  coursing,  etc. may  he  appended  on  separate  sheet. 
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DEVELOPMENT  OF  THE  ARMY  ROTC  MANAGEMENT  SIMULATION 
PROGRAM  AKO  INSTRUCTORS'  ORIENTATION  COURSE 


The  Army  ROTC  Management  Simulation  Program  (M3F)  is  a  modular 
instructional  package  which  provides  ROTC  cadets  with  the  opportunity 
to  apply  and  develop  basic  manj^.gement  skills  in  realistic,  simulated 
situations.    The  accompaning  Instructor  Orientation  Course  (IOC)  is  a 
self-paced,  self-contained,  instructional  program  designed  to  develop 
the  skills  required  to  effectively  teach  the  MS?,    The  purpose  of  this 
paper  is  to  describe  the  development,  objectives  and  content  of  each  of 
these  programs.    In  addition,  the  results  of  comprehensive  field 
evaluations  of  both  programs  will  be  reported. 

Development  of  the  Management  Simulation  Program 

The  HSP  was  conceptualized  as  a  program  which  would  provide 
skills  in  the  interpersonal  and  management  areas  underlying  effective 
leadership.    The  first  step  in  the  development  process  was  determining  the 
management  skills  to  be  included  in  the  program.    After  extensive  literature 
reviews  and  interviews  with  managers,  several  broad,  focal  management 
skills  were  identified  and  then  classified  into  four  separate  modules. 
The  first  module  deals  with  problem  analysis  and  decis ion-making ; 
Module  II  is  concerned  with  management  planning  and  organizing  skills; 
the  third  module  concentrates  on  management  delegation  and  control;  and 
finally^  the  fourth  module  includes  instruction  in  tha  interpersonal 
skills  required  for  effective  management. 

Each  of  the  focal  management  skills  were  fur^;her  divided  in'o  a 
number  of  instructional  units  called  essential  elements.  For  example, 
the  essential  elements  for  problem  analysis  are:  defining  the  problem 
as  it  relates  to  your  function  and  goal,  collecting  and  evaluating  the 
facts,  determining  the  relationship  between  the  facts  and  the  problem, 
anc  identifying  the  most  likely  cause  of  the  problem. 

The  next  step  in  the  developmental  process  was  the  establishment 
of  the  instructional  components  for  the  program.    Decisions  had  to  be 
made  on  how  to  best  present  the  material  to  the  ROTC  cadet.    The  most 
unique  aspect  of  the  MSP  was  the  inclusion  of  specially  designed 
simulations  based  on  assessment  center  technology.    The  assessment 
center  method  utilizes  a  series  of  simulations  which  are  designed 
to  elicit  behavior  which  will  actually  be  required  for  a  given  job. 
In  the  past,  assessment  center  technology  has  been  used  for  evaluating 
management  potential.     The  MSP  is  unique  in  that  it  incorporates 
assessment  center  simulations  into  the  educational  process,  thereby 
allowing  students  the  opportunity  to  actively  participate  in  the 
learning  process.    Two  types  of  simulation  exercises  were  developed 
for  the  MSP.    The  first  type  of  exercise  was  designed  to  elicit 
and  illustrate  behaviors  related  to  specific  essential  elements 
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underlying  a  particular  management  skill.    The  second  type  of 
simulation  was  designed  to  elicit  behaviors  unc'irlylng  all  of  ths  essential 
elements  for  one  or  more  of  the  focal  management  skills .    The  later  type 
of  simulation  was  called  a  "capstone"  exercise. 

Some  of  the  simulations  contained  In  the  MSP  Include  the  Registrar's 
Office  Fact-Finding  Exercise,  Organizing  and  Planning  the  Bicentennial 
Exercise;  Delegation  and  Control  In-Basket,  Executive  Director  of 
the  Commanlty  Fund,  etc.    Each  of  these  e.^erclses  requires  a  large 
degree  of  active  student  participation.    For  example,  the  In-Basket, 
the  capstone  e%-  .:c±3e  for  Module  III,  requires  students  to  play  the 
role  cf  a  plant  manager.    He/she  must  handle  accumulated  letters, 
notes  and  requests  found  in  a  simulated  In-basket.    The  In-Basket 
contains  a  total  of  20  Items  which  require  the  student  to  effectively 
utilize  the  essential  elements  of  delegation  and  control. 

You  may  be  wondering  why  civilian-management  settings  were  selected 
for  use  with  ROTC  cadets.    The  major  factor  In  this  decision  was 
that  ROTC  cadets  are  more  familiar  with  the  civilian  management  environment 
than  they  are  with  the  Army  environment  although  the  skills  underlying 
the  functions  and  responsibilities  of  both  environment  are  the  same. 
It  was  felt  that  unfamiliar  problem  environments  would  distract  the 
student  from  learning  the  targeted  management  skills. 

Although  the  simulations  were  the  primary  vehicle  for  instruction, 
an    Integrated  system  linking  the  simulations  with  one  another  and 
with  the  essenf . •     elements  had  to  be  developed  and  Incorporated 
Into  the  MSP.    Consequently,  other  program  components  were  developed 
Including: 

1.  Relevant  trext  material  which  precisely  defines  the  nature 
of  each  management  ^klll. 

2.  Brief  lectures  which  Introduce  each  module  and  Illustrate 
the  management  skills. 

3.  Group  presentation  and/or  discussion  of  the  results  of  each 
slmi3latlon. 

4.  Highly  structured  feedback  and  reinforcement  of  appropriate 
responses  relative  to  each  exercise  and  the  specific  essential  elements. 

The  complete  MSP  consists  of  four  Instructional  modules  containing 
the  simulation  exercises  and  textual  material;  a  videotape  interview 
for  use  In  teaching  cadets  Interpersonal  skills;  an  Instructor's  workbook; 
and  a  booklet  containing  student  evaluation  material.    Each  module 
Is  a  separate  unit  so  that  the  Instructor  has  the  option  of 
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teaching  all  four  modules  or  any  combination  of  one  or  more  modules. 

Evaluation  of  the  Management  Simulation  Program 

Since  the  MSP  is  a  unique,  educational  approach  in  many  respects. 
It  was  important  to  determine  if  the  program  was  practical  and  feasible 
for  classroom  use.    Another  equally  important  issue  was  whether  or  not 
the  MSP  could  generate  and  maintain  student  inters t  and  involvement. 
To  answer  these  questions,  21  ROTC  programs  participated  in  a  compre- 
hensive field  evaluation.    The  instructors  teaching  the  MSP  as  well  as 
the  cadets  enrolled  in  the  program  were  asked  to  complete  a  survey 
designed  to  gauge  cadet  and  instructor  reactions  to  the  program.  The 
quantitative  results  of  this  evaluation  are  too  lengthy  to  review 
within  the  time  limits  of  this  presentation.    However,  the  following 
conclusions  were  reached  based  on  the  survey  data: 

1.  The  simulation  program  was  generally  vxewed  by  both  instruct- 
ors and  cadets  as  effective  and  interesting, 

2.  Student  materials  were  generally  found  to  be  clear  and  com- 
plete. 

3.  Instructor  material  vzas  found  to  be  adequate. 

4.  In  general,  the  lexigth  of  the  exercises  was  satisfactory  to 
both  instructor  and  cadet . 

Although  the  overall  evaluation  of  the  program  was  favorable,  a 
few  deficiencies  and  suggestions  for  improvement  surfaced.    Based  on 
these  suggestions,  changes  were  made  to  add  to  the  clarity  and/or 
comprehensiveness  of  the  MSP. 

Developmc:^nt  or  the  Instructor  Orientation  Course 

One  of  the  suggestions  made  by  instructors  in  the  KJP  evaluation 
was  the  need  for  training  prior  to  teaching  the  MSP.    Before  the 
evaluation,  20  ROTC  instructors  met  at  a  central  location  to  receive 
guidance  on  how  to  teach  the  MSPe    Obviously,  centralized  training  for 
all  prospective  MSP  instructors  would  be  a  costly  and  impractical 
venture.    An  instructor's  training  program  had  to  be  developed  which 
was  cost-effective,  easy  to  administer  and  minimally  time  consuming 
for  instructors. 

In  order  to  meet  these  needs,  an  Instructor  Orientation  Course 
(IOC)  was  developed.    The  IOC  was  designed  to  accomplish  two  major 
objectives: 

1.    To  provide  an  opportunity  for  potential  instructors  of  the 
MSP  to  experience  the  program  from  a  student  perspective  by  actively 
responding  to  each  exercise  and  all  the  other  program  materials. 
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2.    To  provide  an  opportunity  to  develop  critical  Instructor 
competencies  relative  to  each  component  In  ^he  MSP  by  providing 
Instructional  models  and/or  skill  practice  relative  to  each  competency. 

The  first  step  of  the  development  process  concentrated  on  the 
Identification  of  the  competencies  required  to  effectively  teach  the 
MSP.    Two  workshops  were  attended  by  prospective  Instructors  and  the 
MSP  developer.    During  the  workshops,  the  developer  presented  each 
component  of  the  MSP  to  the  prospective  Instructors  and  required  their 
participation  as  students.    Once  the  student  perspective  was  achieved, 
the  participants  discussed  the  Instructional  skills  necessary  to  teach 
a  specific  component  of  the  program.    Considerable  time  was  spent 
Identifying  the  types  of  activities  and  instruments  which  would  best 
develop  the  essential  skills  required  for  effective  instruction.  The 
Information  generated  in  these  workshops  was  analyzed  to  identify  the 
critical  instructor  compentencies  to  be  addressed  in  the  IOC.  Once 
identified,  the  development  of  an  audio-tape,  workbook  system  began. 

The  IOC  consists  of  three  major  components.    First,  a  short  video- 
tape was  developed  to  provide  a  detailed  introduction  to  the  MSP, 
encourage  the  use  of  the  MSP,  and  provide  a  review  of  the  various  MSP 
and  IOC  program  components. 

The  second  major  comiionent  is  a  series  of  four  audiotapes;  one 
for  each  of  the  MSP  modules.    The  audiotapes  help  the  prospective 
Instructor  in  several  ways: 

1.  They  provide  an  overview  to  each  module. 

2.  They  provide  specific  directions  for  responding  to  the  student 
materials. 

3.  They  review  and  discuss  the  objectives  of  the  exercises  in 
the  student  modules  and  illustrate  typical  student  responses. 

4.  They  delineate  and  discuss  critical  instructor  competencies. 

5.  They  discuss  and  critique  competency  development  activity. 

6.  They  clarify  the  role  of  the  instructor  in  administering  the 
student  activities. 

Lastly,  a  set  of  Instructor  workbooks  were  developed  to  be  used  in 
conjunction  with  the  audiotapes.    It  was  determined  that  only  Modules  I, 
II  and  II  required  workbooks.    This  decision  was  predicated  on  the  fact 
that  the  student  materials  for  Module  IV  were  already  in  a  format  which 
could  be  used  effectively  with  the  audiotape.    The  evaluation  materials 
were  unique  and  were  already  adequately  addressed  in  the  Instructor  s 
Manual.    The  Instructor's    '^rkbooks  were  designed  to  accomplish  the 
following  objectives: 
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1.    To  ^  r  -^  ^Icie  the  student  materials  f  cr  the  ins 


rtr  to  study. 


2,  To  pxovlaae  a  format  for  the  instructor  to  respored  to  student 
macerial^.^ 

3,  Tc  rsTcrarLns:  an  opp^ortunity  for  self-evaluatio3,       responses  to 
student  loasn^iis. 

4,  Tc   prtrrj^ee  activities  to  aid  in  th^  developing-       of  instructor 
competencEsss^. 

5#  Tc  ^trrtv;?rtP>>  an  opxfortunity  fox  self-evalizatlou  the  competency 
development  acUL'v^i ties. 

Instructor  ^^rssrcatlon      urse  Evalua'' ^-t?. 


Sines  tilt   M"'^  repre&*rrTte'i  an  iriTTOv:Tr=i\&  and  rnil^*  ixt^y  complex 
ins  true t:i:J»*  l    unroach,  tine  lOC  h;*c         overcome  r^ss^^    rsxrv.e  to  a  new 
mode  of  'rs^stiuirr'.-^.  prop^j-^jOii  a  si:ud&.    oeisspecEiTSi  of  the  program 
materialsB  aiTid  ^"sn:::iT^Tnenv>?^  and  dereJiJ^I^  instruc^crr  cr>«»etencies  for 
teaching        i  ^jf         mrj^r  n:rrograa  troflwpcnercts .     Cai3S^M«?tiently ,  a  three- 
phase  e-^  i^iw    ^  ^^WB  desij^itae^  and  iaipliienentErL    Tfe    ^^ai^jes  included  a 
small  sc^jU  ?ltT,Telcner  *s  rarti^  a  teisphnne  intsrvlw  ;.£iri  eight  ROTC 
InstructnT*         a  -irail  ones  -lonnair .  sesit  tc  16  KHTI  insstructors. 
^gain,tr-   d  -t<^  resuJLts     ^-e  too  lesgthy  ■     reva:^^  he?re.     In  general, 

however,  d:h-    ^OC     x:.^  v/ci_„l  r:  -.^vec      as  one    nstruc.      stated:  "I 
enjoyed -nsatf-:  -  ac  in  th-  prrogrsar.     It  wa-  a  vaJu.        refresher  as 

well  as  a  n^^ 

Resul-^ '     T     ne  mail  quesuionnr'ire^evealed  that  iznstxuctors  found 
the  vid^ytajcK  ^iti    rmative^  in  teres  LiLiu,  and  of  high  qual£ty.  After 
seeing  'Jam.  ta^^^  luost  instructors  Ih^^ly  recommended  titwrr  the  MSP  be 
incorporatasc  ii:iito  :h€:ir  ROTC  curr'icu^um.    The  audiotapes  were  rated  as 
effectl^        n^gpsring  -'.pcitructors  to  teach  the  MSP.    lastly,  the 
workshops     r.  .    evaluated        useful  in  developxng  the  teaching  competen- 
cies requx.r«d  tto  *'each  the  MSP. 

The  ev€.lAtttiw  was  also  usefuL  isr  identifying  some  of  the  program 
deficiencies  h^^^e  it  b^icame  full3  ^erational.    The  course  materials 
had  to  be  remgjiii  zed  into  one  pac  Kags  and  a  reference  list  on  manage- 
ment was  HPTfcTnpiPri  and  included  a^f  part  of  rhe  course.    In  addition,  the 
video-tape        at  -rtened  and  fHImesd  nver  with  a  professional  narrator. 
Other  minor  "hmigfes  aimed  at  c-aarilf^^.Tig  the  instructions  and  material 
were  also  Instttnir^ed . 

Conclusions . 

The  Army  i  Manual,  FM  Zr-JJDC.  defines  management  as  "the 

process  of  plMiaLag,  organizing^  cnmrdlnatlng,  directing  and  controlling 

I  f  -  - 

mas  ^ 


resouxcea  such  «8  r^/  ="0,  laaterlaLs.,  rime  and  mceiey  to  accomplish  the 
orgamizatlonal  laiissioa*"    The  MSP  tssw  designs!  with  this  purpose  in 
ndnd;  to  prepare  new  lieutenants  £ar  their  jcnns  as  leaders  and 
managers • 

^ The  MSP  anid  IOC  air-e  novel  Isi  Tirgt  they  atpply  assessment  center 
tecimcrlogy  to  sthe  (wHirffltioapal  pro  caa.    The  s±mulation  exercises  spark 
stuin^r^  interaw^  in  mrerl-al  that  ^octaserwise  might  be  somewhat  dry. 
Eva*^tioxi  of  ^oth  pzogrannB  indices:  that  this:  method  of  instruction 
±8  ftJ^^y  stimilatlng  and  effective^    The  evaluation  also  allowed 
pro-  am:  deficasncies  to  be  corrected  before  this  final  version  was 
prcMbced.    The  IfSP  and  IOC  have  recaatly  been  distributed  to  all  ROTC 
regfions  tiirough  TEHADOC.    It  is  exppr^pd  that  i  :  will  be  widely  used  by 
hofl^  institutions  fc.^rvH  coBSLing  year. 
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The  Army,  in  terms  of  training  hardware  and  software  systems  certaim;y 
could  be  described  as  the  world's  largest  buyer  of  training  products 
In  terms  of  only  training  products  a  conservative  estimate  of  expenr 
.tures  for  1978  would  be  approximately  $40,000,000. 

The  dilemma  facing  any  customer  is  how  do  you  get  the  best  product  fftr 
least  amount  of  money.    Specifically,  in  the  training  arena,  the  wettloiti 
boils  down  to,  "How  does  an  organization  buy  'good  design in  itsirr4^r>i«^<j 
materials?" 

In  the  areas  crf  equipment  technntlugy,  the  answer  is  simpler    desiw  "speci- 
fications.   Bat  in  the  area  of  training  technology  do  we  present!:^  ^fj'^ve 
and  are  we  ustng  appropriate  desiiign  specifications? 

The  purpose  of  this  presentation  is  to  suggest  potential  inroads  ^''^iwi- 
tifying  design  considerations  thart  may  be  appropriate,  to  aescrrbe  sm 
in  terms  of  the  Army's  Training  Extension  Course  (TEC)  Program,  to  ^o^mi-ne 
their  research  basis,  and  to  offer  recommendations  on  how  to  insur:  ♦had 
good  design  can  be  obtained  if  it  can  be  recognized. 

It  is  essential  to  state  that  this  presentation  is  restricting  its  T^ut 
to  only  the  design  phase  of  training  development.    Acknowledgimg,  r«s  a 
given,  that  the  "What  to  train"  decision  has  been  made  in  accordanx^  w^th 
a  systematic  process. 

A  paraphrased  restatement  then  of  the  original  question  is:    "How  <tb  jKP>u 
buy  good  design  in  your  training  materials,"  and  equally  important  "hoiiH 
do  you  recognize  it?" 

To  establish  a  reference  point,  "good  design"  will  be  loosely  dEf^ffir  as: 
the  utilization  of  the  "best"  learning  theories  during  the  desigt 
instructional  materials  for  a  given  set  of  instructional  objecti*^ 
a  specified  target  population,  and  will  be  evaluated  in  terms  of  emcy 
and  effectiveness. 

TEC  History 

The  U.S.  Army  subsequent  to  the  Vietnam  War  was  able  to  examine  it  tf^)n- 
ing  programs,  and  found  that  there  was  an  acknowledged  training  -fr    ^  e^c^ 
in  terms  of  individual  competencies  and  unit  proficiency.    The  Tr^^  ng 
Extension  Course  (TEC)  Program  was  developed  as  one  attempt  to  r  this 
defi  ciency. 

The  scope  of  TEC  is  mind  boggling.    Since  1973  1  ,050  lessons  hav.  -le. 
produced,  3000  lessons  are  in  various  stages  of  development,  anr    ^  are 
projected  to  be  developed  each  year.    The  total  expenditure  for  t^ji,  -7E.C 
program  to  date  is  over  120  million  dollars. 
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TET  -itiTizes  a    jfrflKdia  format  and  capitalizes  on  the  subject  matter 
ex^^Ttt-^se  foim*       tte  various  service  schools  to  produce  exportable 
tra^fifwg  materi^al  crable  of  being  ised  by  the  individual  soldier,  both 
acr**^  duty  ant- TiKCFve  components.    The  vast  majoritry  of  these  TEC  ^ 
Issipps  are  ^tt^  mfo visual  format  utilizing  a  closed-loop  filmstrip 
and  m- iudin  s.'ssette,  which  are  played  on  a  Bessie-  Cue-See  projector. 

The  TH:  cb.ilecr-   es  were : 

JUmKilViiM  SOLDIER  -  To  provide  packaged,  validated  self- 
ieifllrrristered,  individualized,  and  self-^at^d  instru-tional 
mrs^ls  to  soldiers  in  unrts  to  teacfr  rhosE  ne  :e3-^ry 
t^Ki   for  job/duty  proficiency  requirec  in  DOth  p^a'^time 
aniHS   -."ami  environments. 

aWL-  IIET  COMMANDER  -  To  essist  the  commander  In  reducing 
b  rtUt^sam  materiaT~preparc±ion  time  arrd  resDiir  .  s  ay 
(fe  dlcaSiirg  training  methods  and  resources  ti  t  ■  oprtraal 
la?  ^*-fthB  TEC  materials  which  facilitate  iT«  s-al1  unit 
c-mr  inder's  role  as  a  training  manager.     (TEC.  97F 

^ressmf  ^  ^jidfj^rvisual  materials  have  emerged  as  the  pr-twary  raedia  for 
TEC.    Tney    ^ei'  replicability  of  systematically  desi^'ied  instruction, 
ease  (flf  elrssaririration ,  adaptibility  to  self-paced,  iitSivi dualized, 
critemMi  -re^mced  instruction,  and  an  attractiveness  to  the  target 
popul«ti<Bn.  what  are  the  design  guidelines  that  xre  driving  this 

systennti? 

Since  the  TEC  ..n)gram  is  task  specific,  self-paced,  and  performance  ^ 
oriented,  it  p'-esupposes  learning,  will  occur  because  of  the  subject  s 
inter&tction  wirh  these  materials.    A  recent  study  Dbserved  that,  "when 
the  s^tudent  -f  s  considered  as  an  active  agent  in  hirs  own  learning,  it 
becoWBfes  necer-iary  to  emphasize  those  student  activities  and  processes  which 
give  ■•'■'isB  to  iearning." 

(Bertou,  Clemen,  &  Lambert,  1972) 

The  f=>eld  of  stady  that  deals  with  the  manipulaticrr  of  events  and  activ- 
iti^  within  instruction  to  give  rise  to  learning        labeled  "mathema- 
genlcs"  by  Ernest  Rothkopf  (1963). 

Tils  concept  is  precisely  the  focus  of  my  researcr :    what  are  those 
nstructlonal  events  that  if  present  in  an  audio  -isual  lesson  will  give 
•'ise  to  learning?" 

he  vast  majority  of  the  research  on  the  mathemagen-    theory  has  been 
:=a3nducted  vrith  printed  text.    Little  research  has  h&eu  conducted  in  the 
ss^a  of  audio-visual  instruction  utilizing  the  mathainsgeni cs  concepts. 
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The  Army,  since  tiie  elfmnation  of  the  dtzst  has  experienced  a-  change  in 
the  ability  levels  of  Sildiers  now  enterirag:.    In  preparing  itiLs  instruc- 
tional materials,  teiarnrfcal  manuals,  job  performance  aids,  awe!  Holilier 
related  publications  it  must  accommodate  tne  mental  processing? jaWli ties 
of  its  Stti^rs.    Th^  ^Aarmy  must  insure  than  each  of  its  inst^ac^wal 
programs  can  produce  tm  optimal  amount  of  learning,  given  a  'Wta*  variance 
in  degreaB  of  soldier  aacnlity. 

If  the  Atwy  is  gettinsg  ^  soldier  with  c  lower  ability  level ,  peimaps  these 
individitetes,  labeled  &s  maving  lower  ab  lity,  actually  just  daer  t  have 
those  l^^r^frte  learning  skills  of  attentic^i,  perceptual  processw^,  association , 
abstr^twn,  and  encccrifrng  ^ich  are  important  during  the  infumiiMon  pre- 
smt^tmm  phase  of  insrrucrdon. 

-^t^  jtrer several  posHibl?$E  strategies  that  could  be  adopted  to  lascommodate 
mte  w^ltdier.    An  audi^visHP  format  is  a  means  of  circumventing  potential 
^-^^Ira  cfisabilities.    Htighlighting  or  emphasizing  important  pofimts  with 
-zr^     my  compensate  fo*r  poor  attention.    Representation  or  repeated 
n^vi  T^s  of  a  lesson  Bnay  affect  retention.    Providing  an  advamse  organizer 
jr^y  Sr.  t  the  structure  anJ  facilitate  acquistion.    Questioning  strategies 
Bsrth  ily  have  some  obviotus  functions:    review  questions  in  their  recall 
es  :ntial  prerequisitse  skills ,  preadjuncti ve  questions  as  a  focusing 
nstri^3CTt  for  selective-  attention,  and  postadjuncti ve  questions  as  an 
otteirtrdon,  maintenance  fmction  and  general  search  strategy.    Feedback  has 
tradittionally  been  an  e^-ential  component  of  instruction,  along  with 
oract-<ce  exercise  and  ?    e  type  of  a  self-evaluation.    But  what  direction 
t^s  research  suggested     idit  we  go? 

/f  we  looked  at  printec    aterials  it  seems  that,  "It  is  what  the  student 
::^es  *»rth  the  words  he  reads  while  he  reads  them  that  determines  the 
fffcieincy  of  learning''  (Frase,  1968).    Appending  a  corollary  to  this 
Theorem:    It  may  well what  the  student  does  with  the  information  pre- 
sented (orally,  visuaVy,  or  problematically)  while  engaged  in  the  learning 
process  that  determines  the  effectiveness  of  learning. 

Some  research  on  reprsentation  of  lessons  offers  some  intriguing  findings: 
the  first  semantic  encoding  by  a  learner  is  relatively  stable  over  time  in 
spite  of  representation  of  the  material,  and  successive  retesting  over  the 
material.    It  was  observed  that: 


subjects  are  unable  to  profit  so  much  as  one  might 
expect  them  to  from  the  opportunities  for  improvement 
and  for  making  corrections  that  appear  to  be  provided 
by  the  repeated  presentation  of  the  material.  The 
version  that  an  individual  has  himself  reproduced 
appears  to  be  particularly  stable  in  his  memory,  and 
hence  resistent  to  changes  in  the  direction  either 
of  increased  accuracy  or  increased  forgetting.    (Howe,  1977) 


1101 


Another  research  area  that  has  offered  some  interesting  insights  cteals 
with  the  use  of  feedback.    Feedback  can  be  descrTBed  as  the  information 
presented  to  the  subject  immediately  after  a  resaaonse  to  a  question 
that  enables  the  subject  to  judge  the  correctness  or  completeness  of 
that  response.    Some  generalizations  have  emerges: 

1)  Feedback  is  not  important  if  the  sti'dent  las  made  a  correct  response 
to  the  question  in  instructional  materials. 

2)  Feedback  is  critical  if  the  student  has  isde  an  incorrect  r^ponse. 

3)  Feedback  is  appropriate  only  when  the  stLuiBnt  has  made  a  faulty 
interpretation  of  the  materials  or  question,  it  "  iot  appropriate  far 
a  lack  of  understanding. 

4)  If  the  feedback  is  readily  available  it  will  nave  no  effect. 

5)  The  delayed  presentation  of  feedback  for  a  day  increases  retention 
and  performance  (Kulhavy,  1977) 

Undoubtedly,  there  are  appropriate  and  inappropjmze  strategies  for  the 
design  of  audio-visual  Issons  like  TEC.    Allen  n'975)  has  tried  to 
generalize  from  research,  "What  can  the  designer  and  producer  do  to 
manipulate,  arrange,  emphasize,  or  enhance  the  way  the  message  is  pre- 
sented to  optimize  learning  from  it."    He  concluded  that  ..."it  uMOuld 
appear,  therefore,  that  both  empirical  evidenc-  and  theory  point  to 
greater  benefit  for  lower  ability  learners  froin  procedures  that  give 
direction  to  their  inspectional  behavior  of  the  instructional  stirmuli  to 
which  they  are  exposed.    Such  techniques  would  be  ejcpected  to  compensate 
for  their  poor  attentional  and  discriminational  abilities."    (Allen,  1975) 

The  research  over  the  last  15  years  has  shown  that  adjunctive  questions 
when  inserted  within  an  instructional  package  can  enhance  learning,  assist 
in  attention,  and  accommodate  poor  discriminational  abilitie:;. 

It  could  be  shown  that  the  gifted  learners  are  those  who  have  acquired, 
developed,  and  internalized  the  mathemagenic  aids  that  facilitate  learn- 
ing.   But  for  the  inattentive,  it  may  well  be,  as  Rothkopf  has  proposed 
that,  "...It  is  under  conditions  of  ineffective  mathemagenic  activity 
that  treatments  such  as  adjunct  questions  have  produced  the  best  results" 


Traditionally,  two  types  of  adjunctive  questions  have  been  discussed: 
preadjunctive,  meaning  those  questions  placed  in  front  of  the  material 
they  relate  to,  and  postadjuncti ve ,  or  those  that  come  after  it. 


(1974). 
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Considering  "tJte  role  of  the  preadjunctive  question,  Peeck  (1970)  has 
summarized  tte«c  the  "experiments  with  pre-questions . .  .seem  to  indicate 
that  prequestfccHKS  are  useful  when  retention  of  certain  specific  infor- 
mation is  at,  though  depression  of  retention  of  other  contents 
may  have  t»'  be  ^aiken  into  the  bargain." 

The  usefuln'^ss  7-f  post-adjuncti ve  questions  to  facilitate  learning  and 
retention  its         documented  (Rothkopf,  1966;  Rothkpf  &  Bisbicos,  1967; 
Swenson  &  KddiBWPy,  1974).    Adjunctive  questions  when  incorporated  into 
a  printed  tfflcx  have  assisted  in  the  learning  process. 

But  do  thejy  assist,  as  a  mathemagenic  aid,  in  audio  visual  lessons? 
And  if  so,  hrtere  should  they  be  placed?   and  what  type  of  question  is 
best?" 

These  are  just  the  prelude  to  the  plethora  of  questions  that  must  be 
raised  if  "good  design"  is  to  be  bought.    Should  review  questions  be 
used  in  a  series  of  lessons?   Should  feedback  be  provided  after  a  question? 
Should  an  audto  visual  lesson  have  a  practice  exercise?    Should  "there 
be  a  self-«val uation  at  the  end  of  the  lesson  and  should  it  be  different 
from  the  within  program  questions? 

Depending  upon  the  material  being  taught  and  the  target  population  par- 
ti cipatirrg^  the  answer(s)  may  be  yes  or  no. 

The  following  examples  are  provided  to  illustrate  this  point.  Recently, 
as  part  of  a  research  study,  218  TEC  lessons  were  observed  using  a  check- 
list to  Hionitor  the  presence  or  absence  of  mathemagenic  aids.  These 
lessons  were  representative  of  every  TEC  contractor,  service  school, 
subject  matter  area,  and  developmental  year.    This  study  indicated  that 
there  is  extreme  variance  in  the  design  characteristics  of  the  lessons 
selected,  in  terms  of  the  presence  of:    review  questions,  preadjunctive 
questions,  postadjuncti ve  questions,  feedback,  practice  exercise,  and 
sel f-eval uation. 
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From  these  few  examples  it  can  be  seen  that  there  are  several  different 
design  approaches  out  there.    Recognize  that  I  am  not  proposing  a  singular 
best  design  strategy  that  combines  all  of  the  above.    Rather  what  I  am 
proposing,  is  that  the  appropriate  mathemagenic  aid(s)  be  inserted  into 
the  lesson  at  the  right  place  as  a  primary  design  guideline  to  facilitate 
both  acquisition  and  retention. 

Certainly  the  purchaser  of  a  product  has  to  trust  the  professional  com- 
petence of  the  contractor,  but  if  you  have  not  achieved  an  agreed  upon 
design  strategy  that  is  defensible  tj  other  professionals  then  how  can 
you  ever  buy  good  design?   To  which  the  rejoinder  is,  but  all  of  these 
lessons  discussed  above  were  validated  on  the  target  population.  Is 
validation,  then,  as  it  is  presently  practiced,  a  true  Indicator  to  the 
buyer  that  he  has  purchased  the  "best  design  possible  in  terms  of  effi- 
ciency and  effectiveness? 

As  previously  suggested,  there  is  no  "best  design  strategy."    But  there 
are  suggested  design  strategies  that  should  seriously  be  considered 
during  the  design  of  audiovisual  lessons.    If  the  set  of  objectives  being 
taught,  ^re  distributed  and  sequenced  over  a  series  of  lessons,  then 
review  questions  to  facilitate  the  recall  of  prerequisite  or  previously 
learned  information  are  probably  called  for. 

If  there  are  sections  of  the  lesson  that  are  not  intrinsically  stimu- 
lating or  are  particularly  important  concepts,  postadjuncti ve  questions 
can  perform  both  a  backward  review  function  and  an  arousal  function.  An 
appropriate  combination  of  lower  order  and  higher  order  questions  may 
provide  the  best  combination. 

it  is  important,  that  the  student  actually  master  the  objective,  and 
*"'r:.qionstrate  competence,  then  a  self-evaluation  exercise  at  the  end  of 
?n2  lesson  can  provide  the  opportunity  to  put  it  all  together  one  more 
f^ne,  and  Identify  any  problem  areas.    But  if  this  exercise  is  identical 
^"•^o  the  within  program  questions,  it  may  test  only  short  term  memory  and 
not  acquisition  of  the  concept. 

It  may  well  be  that  feedback  becomes  more  of  a  crutch  than  an  aide. 
Utilizing  the  guidelines  previously  identified,  feedback  should  be  in- 
cluded as  a  design  element,  but  the  unique  conditions  under  which  it 
may  be  beneficial  must  be  considered. 

Practice  Exercise  as  a  mathemagenic  aid  is  indisputable  for  it  allows 
the  student  to  put  what's  in  his  head  into  his  hands;  and  by  doing  it 
with  his  hands,  he  may  well  be  able  to  solidify  the  actions  into  the 
process  and  insure  retention.    But  practice  exercise  for  practice  exer- 
cise sake  doesn't  accomplish  anything. 
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The  essential  ingredient  emerging  from  all  of  this  discussion  may  well 
be  the  synergistic  effect  of  all  of  these  design  considerations  when 
artfully  employed  together. 

Good  design  is  not  an  accident,  either  in  terms  of  the  buyer  or  the 
seller.    It  must  be  planned  for  and  incorporated  as  a  design  speci- 
fication within  the  contract.    If  the  goal  of  the  lesson  is  to  achieve 
a  70%  retention  after  10  days,  then  it  seems  obvious  that  the  validation 
should  reflect  this  performance  description. 

The  specifics  of  my  research  have  been  intentionally  glossed  over,  so  as 
to  show  the  big  picture  and  provide  a  forum  for  discussion.  Recognizing 
the  role  and  mission  of  the  attendees  at  this  conference,  we  represent 
a  tremendously  large  organization  that  literally  spends  hundreds  of 
millions  of  dollars  per  year  on  training.  I  think  that  all  too  often  we 
fail  to  ask  of  ourselves  an  important  question.  Are  we  presently  getting 
the  best  designed  training  for  the  money  that  is  spent? 


i/ 
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Background 


Coast  Guard  Class  **A"  schools  train  first-tour  enlisted  personnel 
for  Coast  Guard  jobs  at  the  E-^  level.    Course  work  in  the  three 
schools  included  in  this  study  (Aviation  Electronics  Technician 
(AT),  Damage  Controlman  (DC),  and  Radioman  (RM))  ranges  from  15 
to  28  weeks  of  instruction.    School  curricula  are  designed  to 
train  the  rating-specific  knowledges  and  skills  outlined  in  the 
Enlisted  Qualifications  Manual,  which  specifies  minimum  qualifica- 
tion requirements  for  each  Coast  Guard  specialty. 

The  purpose  of  this  study  was  to  develop  and  demonstrate  procedures 
for  evaluating  the  content  validity  of  the  "A*'  school  curricula 
for  preparing  and  selecting  personnel  for  the  three  job  specialties. 
Validation  is  the  process  of  demonstrat ing  that  job  selection  or 
assignment  procedures  are  related  to  job  performance.  When 
selection  procedures  are  intended  to  be  a  representative  sample 
of  the  job  performance  domain  (the  set  of  all  tasks  to  be  performed 
on  the  job)  content  validation  is  the  most  appropriate  validation 
strategy.    The  recently  issued  Uniform  Guidelines  on  Employee 
Selection  Procedures  state,  "Where  a  measure  of  success  in  a 
training  program  is  used  as  a  selection  procedure  and  the  content 
of  a  training  program  is  justified  on  the  basis  of  content  validity, 
the  use  should  be  justified  on  the  relationship  between  the 
content  of  the  training  program  and  the  content  of  the  job." 

The  method  used  to  demonstrate  this  relationship  should  have  two 
properties.     First,  it  should  be  as  specific  and  precise  as 
possible.    This  will   increase  the  reliability  and  repi icabi I i ty 
of  the  results,  as  well  as  making  more  obvious  the  grounds  for 
concluding  that  the  fit  between  training  and  the  job  is  either 
good  or  poor.     Second,  the  procedure  for  evaluating  the  validity 
of  the  training  curriculum  should  be  independent  of  the  training 
process.     Ideally,  this  means  that  people  responsible  for  developing 
a  curriculum  should  not  be  asked  to  evaluate  how  well  that  program 
fits  the  job--there  is  a  potential  conflict  of  interest  which 
could  operate,  however  unintentionally,  to  influence  the  results. 
Where  training  personnel  must  be  used  to  assess  curriculum  validity, 
the  validity  evaluation  procedures  should  be  made  as  standardized 
and  as  explicit  as  possible,  so  that  the  evaluation  task  will  be 
as  objective  and  clear-cut  as  possible  for  all  participants. 

A  poor  fit  between  training  content  and  job  requirements  has 
several  implications  for  personnel  management.     If  performed 
tasks  are  not  being  trained,  operational  units  perform  inefficiently, 
and  personnel  time  on  the  job  will  have  to  be  devoted  to  task 
learning.     If  schools  are  training  tasks  which  are  not  performed, 
training  resources  are  being  wasted,  and  some  students  may  fail 
in  the  course  because  of  an  inability  to  learn  tasks  irrelevant 
to  performing  the  job. 


Procedures 


The  Coast  Guarxi  provided  us  with  task  lists  defining  the  job 
activities  for  each  of  the  three  specialties.    To  evaluate  the 
fit  between  training  content  and  job  activities,  we  collected 
three  types  of  data. 

First,  we  asked  several  instructors  from  each  school  to  Indicate 
whether  eacf^i  task  was  trained  in  the  "A*'  schools  and  If  so,  how 
directly  it  was  emphasized  in  the  curriculum.    To  make  this  task 
more  manageable,  we  first  divided  the  school  curriculum  outline 
Into  approximately  50  homogeneous  topics.    Each  "A"  school  instructor 
was  asked  to  indicate  how  much  emphasis  was  given  to  each  task  In 
those  curriculum  topics  which  he  personally  taught.    The  rating 
scale  used  for  this  task  is  shown  in  Figure  K    This  scale  has 
four  basic  levels  of  training:    a  zero  or  one  rating  indicates 
that  a  task  is  not  trained;  a  two  or  three  rating  Indicates  that 
some  Information  is  presented  In  training  that  may  be  tangential ly 
related  to  task  performance,  but  task  proficiency  Is  not  directly 
addressed;  a  four  or  five  rating  Indicates  that  the  training  Is 
directed  toward  task  performance,  but  that  performance  is  not 
completely  developed;  and  a  six  or  seven  rating  Indicates  that 
the  training  directly  and  specifically  develops  task  proficiency. 
Two  rating  values  were  provided  within  each  of  these  four  basic 
levels  to  allow  raters  to  reflect  minor  differences  in  task 
emphasis  within  level.     In  our  curriculum  evaluation  analysis,  we 
judged  that  a  task  receiving  a  curriculum  rating  of  higher  than 
three  was  "trained",  and  that  tasks  receiving  ratings  of  three  or 
lower  were  "not  trained". 

The  second  set  of  ratings  we  obtained  was  also  aimed  at  identifying 
which  tasks  are  trained.     In  this  rating,  we  asked  ten  graduating 
students  from  each  "A"  school  to  indicate  whether  they  could 
perform  each  of  the  tasks  defining  their  specialty.    Tasks  they 
could  perform  were  considered  "learned",  although  some  of  these 
tasks  may  have  been  learned  prior  to  "A"  school. 

The  third  data  set  we  collected  Included  ratings  of  whether  each 
task  was  performed  on  the  job,  and  if  so,  the  relative  time  spent 
performing  the  task,  task  difficulty,  and  task  critical ity.  We 
chose  these  rating  factors  because  we  felt  that  strong  curriculum 
emphasis  on  a  task  could  be  justified  If  the  task  were  time 
consuming,  or  critical,  or  difficult.    Our  own  experience  with 
earlier  task  Inventories  suggested  that  time  spent  and:  critical Ity 
were  sufficiently  independent  to  warrant  measuring  the  two  factors 
separately.    Task  difficulty  has  usually  been  assessed  In  the 
past  by  estimating  the  amount  of  training  required  for  task 
proficiency,  but  since  in  this  case  we  were  evaluating  whether 
the  amount  of  training  was  indeed  appropriate,  we  sought  an 
Independent  estimate  of  task  difficulty,  without  referring  to 
training  requirements.    Ratings  on  all  three  factors  were  obtained 
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Instructions  for  Curriculum  Element 
Contribution  Ratings 


Attached  Is  a  list  of  about  50  curriculum  elements  or  course  topics 
covered  in  your  A  school.     First,  review  the  list  and  identify  those 
curriculum  elements  which  you  are  current ly  teaching.     In  the  upper 
right  hand  corner  of  the  attached  rating  form  are  spaces  for  the 
numbers  of  up  to  17  curriculum  elements.    Write  the  numbers  of  the 
curriculum  elements  you  teach  in  these  spaces.     It  may  be  helpful  if 
you  write  the  name  of  each  curriculum  element  in  the  area  above  its 
number  as  wel 1 . 

Now,  consider  only  those  tasks  which  you  did  not  check  in  your  first 
rating,  and  only  those  curriculum  elements  which  you  are  currently 
training.    We  want  to  know  how  much  each  of  your  curriculum  elements 
contributes*  to  developing  proficiency  In  each  task.    Use  the  following 
scale  to  estimate  the  contribution  to  task  proficiency  made  by  each 
curriculum  element: 


Unrel ated 


Contributes  little  or  only  indirectly 


Makes  a  direct  contribution  or  is  a 
prerequisite  to  task  proficiency 


Directly  develops  nearly  complete  task  proficiency 


FIGURE  1. 


Rating  scale  for 


making  task  training 


emphasis  ratings. 


from  senior  enlisted  personnel  who  supervised  E-^s  In  the  three 
specialties.    The  raters  were  asked  to  consider  the  activities 
performed  by  the  E-4s  under  them,  and  to  rate  each  task's  time 
spent,  critical Ity,  and  difficulty  using  9-poInt  relative  scales 
ranging  from  ''much  below  average"  to  "much  above  average".  Since 
training  emphasis  could  be  justified  for  time  consuming  £r  critical 
or^  difflcul t  tasks,  each  task  was  assigned  a  value  equal  to  the 
highest  mean  rating  for  any  of  the  three  rating  factors,  and 
tasks  with  mean  rating  values  greater  than  3.0  ("below  average") 
were  considered  "performed". 

These  data  allowed  us  to  perform  three  curriculum  content 
validation  analyses.    First,  by  correlating  mean  ratings  of 
training  emphasis  for  each  task  with  mean  time  spent,  difficulty 
or  critical ity  ratings,  we  were  able  to  evaluate  the  extent  to 
which  the  training  emphasis  profile  matched  the  Job  rating  profile 
across  a1 1  tasks. 

Second,  by  referring  to  simple  ratings  of  whether  tasks  were 
trained  In  the  "A"  schools,  whether  "A"  school  graduates  could 
perform  tasks,  and  whether  E-4s  in  the  field  were  required  to 
perform  tasks,  we  were  able  to  draw  several  conclusions  about 
curriculum  quality.     If  tasks  were  trained  but  not  performed, 
they  were  considered  "over  trained".     If  tasks  were  rated  as 
being  performed  on  the  job,  and  as  being  trained,  but  recent  "A" 
school  graduates  indicated  they  could  not  perform  them,  they  were 
considered  "not ,  learned" .    Finally,  if  tasks  were  performed  on 
the  Job,  but  were  not  trained  in  the  school  and  were  not  already 
within  the  repertoire  of  graduating  students,  they  were  flagged 
as  "not  trained,"  and  should  be  considered  for  inclusion  either 
in  the  "A"  school  curriculum  or  in  some  other  Coast  Guard  training 
program  such  as  basic  recruit  training. 

The  third  validation  analysis  was  similar  to  the  second,  but  was 
based  on  continuous  ratings  of  training  emphasis  and  task  time 
spent,  critical ity,  or  difficulty  instead  of  the  dichotomous 
performed-not  performed  and  trained-not  ratings.    Again,  the 
analysis  identifies  tasks  that  are  trained  (rated  above  3.0  in 
training  emphasis)  but  not  performed  (rated  below  3.0  in  time 
spent,  criticality,  or  difficulty). 

Results 

Initial  data  analyses  showed  most  rating  reliabilities  were  quite 
high.  Interrater  agreement  on  most  factors  was  in  the  upper  .80s 
or  lower  .90s,  with  approximately  ten  raters. 

Correlations  between  dichotomous  ratings  of  whether  tasks  are 
trained  and  whether  tasks  are  performed  are  acceptably  high  as 
shown  in  Table  1.    For  the  dichotomous  ratings  (trained-not 


Table  1 


Correlations  Between  Task  Training  Emphasis  and 
Task  Job  Requirements  for  Three  Coast  Guard  Jobs 


Job 

Number 
Tasks 

Tra  ining-Job 
Correlation 
for  Dichotomous 
Data 

Training-Job 
Correlat  ion 
for  Cont  inuous 
Data 

RM 

k03 

.89 

AT 

327 

.65 

DC 

A77 

.9^ 

•  82 

}  1112 

ERIC 


trained  and  performed-not  performed)  these  correlations  are  .89 
for  RM,  .84  for  AT,  and  .9^  for  DC  "A'^  school  curricula.    For  the 
continuous  training  emphasis  and  job  task  factor  ratings  the 
correlations  between  school  and  job  ratings  are  .7^  for  RM,  .65 
for  AT,  and  .82  for  DC  *W  schools.    These  values  :  'ggest  that 
the  schools  are  training  those  tasks  regularly  per     '^-^     on  the 
job,  and  that  the  most  time  consuming,  difficult,  itical 
tasks  receive  the  most  emphasis  in  the  *'A"  school  tula. 

Figure  2  shows  some  of  the  curriculum  validation  r^       .s  we  have 
obtained  thus  far  for  one  of  the  three  specialtie         le  data 
shown  in  this  figure  are  proportions  of  training  raters  who 
indicated  each  task  was  trained,  the  proportions  of  graduating 
students  who  reported  they  could  perform  the  tasks,  and  the 
proportions  of  supervisors  who  indicated  the  tasks  were  performed 
on  the  job.    We  more  or  less  arbitrarily  decided  that,  for  this 
illustration,  any  task  rated  as  "trained"  by  ten  percent  or  more 
training  experts  would  be  considered  "trained",  and  any  task 
rated  as  "performed"  by  more  than  30  percent  of  the  supervisory 
raters  would  be  considered  "performed".    We  felt  these  values 
would  produce  a  "conservative"  picture  of  train ing-job  fit.  If 
as  few  as  10  percent  of  instructors  indicated  a  task  was  trained, 
and  as  many  as  70  percent  of  supervisors  indicated  it  was  performed, 
the  task  would  be  flagged  as  potentially  overtrained.    The  first 
two  tasks  in  Figure  2  are  examples  of  a  good  fit  between  job 
requirements  and  training  content:    where  job  demand  is  high, 
training  emphasis  is  high,  and  where  the  job  demand  is  low,  tasks 
are  not  trained.     (The  60  percent  of  trainees  who  report  they  are 
able  to  prepare  shipyard  overhaul  requests  in  task  2  have  been 
exposed  to  the  required  forms  in  their  careers  prior  to  "A" 
school.)    Of  the  k77  tasks  in  this  specialty,  452  showed  this 
kind  of  fit.    The  remaining  25  tasks  in  Figure  2  are  those  that 
were  flagged  for  consideration  by  "A"  school  personnel. 

Tasks  that  are  trained  but  not  performed  are  flagged  in  the 
fourth  column  as  "over  trained"  (e.g.,  tasks  12,  16,  and  32). 
"A"  school  personnel  must  consider  these  tasks  and  decide  whether 
they  actual  1 y  are  being  trained,  and  if  so,  why  they  are  trained 
given  their  low  contribution  to  the  job.    Some  tasks  that  were 
flagged  as  over  trained  appear  to  be  errors  by  training  raters 
more  than  curriculum  faults.    Thus,  in  task  32,  since  students 
learn  to  read  diagrams  and  blueprints  in  this  school,  the  training 
raters  felt  they  were  contributing  to  the  students'  ability  to 
teach  blueprint  and  diagram  reading,  and  indicated  that  the  task 
was  trained,  when,  in  fact,  the  curriculum  in  question  does  not 
include  sessions  on  how  to  teach  others  to  read  blueprints. 
Other  flagged  tasks  seem  to  reflect  the  "A"  school  setting. 
Thus,  rf  students  stood  watches  during  their  "A"  school  assignment, 
the  ins^uctsxrs:  indicated  that  these  watchstanding  activities 
wcne;Jatight^Et:^^^^    school   (e.g.,  tasks  A21 ,  ^23,  and  A2^). 
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U.S.  COAST  GUARD  CURRICULUH  VALIDATION 


DC  TASK  TRAIMING/JOB  COHPARISON 

TRG  PROP  CUTOFF  -  .10       JOB  PROP  CUTOFF  -  .30 


TASK 

k 

1 
12 
16 
32 

212 
213 
265 
266 
267 

268 
269 
270 
271 
272 

273 
27^ 
275 
276 
277 

278 
279 
28o 

423 
459 


Assfgn  DC  personnel  to  dally  tasks 
Prepare  ship-yard  overhaul  requests 
Plan  NBC  drills 

Prepare  watch,  quarters,  and  station  bills 
Teach  reading  and  drafting  blueprints 

Operate  NIG  welding  equipment 
Operate  TIG  welding  equipment 
Repair  portable  pumps 
Repair  furniture 
Repair  ladders  and  gangways 

Repair  ceramic  tl le 
Repair  tile  deck  covering 
Repair  sinks 
Repair  flushing  units 
Repair  firemain  system 

Repair  pr^issurlzed  air  system 

Repair,  fresh-water  system 

Repair  fixed  CO2  system 

Repair  sanitary  system 

Repair  piers,  camels,  floats,  ramps,  etc. 

Repair  lock  and  key  systems 
Repair  roofing 

Repair  minor  drainage  problems 
Perform  duty  as  wheel  watch 
Perform  duty  as  loran  watch 

Perform  duty  as  teletype  watch 
Work  as  diver 


N«3 

N*IO 

N»IO 

PROP 

PROP 

PRUr 

TRNEE 

JOB 

OVE^ 

NOT 

TRND 

ABLE 

PFMG 

TRAINED 

LEARNED 

1.00 

1.00 

.90 

.00 

.60 

.10 

1  . 00 

1 . 00 

.30 

1  nn 

.90 

.  1 0 

1.00 

.'10 

.10 

.00 

.20 

.70 

.00 

.20 

.60 

1.00 

.00 

.80 

ftikit 

1.00 

.00 

1.00 

.33 

.00 

.90 

.00 

.00 

.80 

1.00 

.00 

.90 

AAA 

1.00 

.00 

.80 

AAA 

1.00 

.00 

1.00 

AAA 

.00 

.00 

.90 

.00 

.00 

.80 

.00 

.00 

.80 

.00 

.00 

.80 

.00 

.00 

.90 

.00 

.00 

.60 

1.00 

.00 

.70 

AAA 

1.00 

.00 

.70 

i-AA 

1.00 

.00 

1.00 

AAA 

1.00 

.70 

.20 

*** 

1.00 

M 

.20 

1.00 

.50 

.10 

.00 

.20 

M 

NOT 
TRAINED 


AAA 
AAA 


AAA 

AAA 
AAA  ' 
AAA 
AAA 
AAA 


FIGURE  2.    Tasks  flagged  In  Damage  Controlman  curriculum  content  validation  analysis. 
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kinds  of  '"over  trained"  tasks  do  not  represent  any  serious  probl«ns 
In  "A"  school  content. 

Tasks  that  are  performed  on  the  job,  and  that  are  rated  as  being 
trained  In  the  school,  but  which  cannot  be  performed  by  graduating 
students  are  flagged  as  "not  learned"  in  column  five.    For  example, 
all  job  raters  agree  that  task  266,  repairing  furniiture,  is 
performed  on  the  job,  and  all  "A"  school  raters  agree  that  the 
task  is  trained,  but  none  of  the  ten  graduating  students  we 
sampled  felt  he  could  repair  furniture.     In  many  cases,  such 
tasks  can  be  attributed  to  differences  of  interpretation;  for 
example,  if  Instructors  teach  general  woodworking  skills  and 
mention  that  these  principles  apply  to  furniture  repair  as  well 
as  small  boat  repair,  they  may  feel  they  have  "taught"  furniture 
repairing,  while  students,  who  can't  recal 1  being  shown  how  to 
reupholster  couches.  Interpret  the  same  task  more  narrowly.  In 
any  case,  we  will  ask  "A"  school  people  to  review  these  "not 
learned"  tasks,  and  to  decide  whether  they  imply  any  weaknesses 
I n  the  curr icul um. 

Finally,  tasks  that  are  required  on  the  job,  but  which  are  not 
trained  at  the  "A"  school,  and  which  are  not  already  within  the^ 
repertoire  of  graduating  students  are  flagged  as  "not  trained"  in 
column  six.    These  represent  tasks  which  are  currently  being 
learned  on  the  job  or  in  later  schools.    The  Coast  Guard  will 
review  them  to  determine  whether  they  should  be  formally  trained 
in  "A"  school,  in  basic  training,  or  in  some  other  training 
operation,  to  ensure  that  personnel  are  being  sent  to  their  field 
assignments  with  al 1  the  skills  they  will  require  ot  th^  job. 

Summary 

In  review,  we  have  attempted  to  develop  a  procedure  which  the 
Coast  Guard  can  use  to  evaluate  the  content  validity  of  Its  Class 
A  school  curricula.    The  method  provides  an  overall  quantitative 
Index  of  the  fit  between  job  task  requirements  and  training  task 
emphasis,  and  pinpoints  specific  areas  of  potential  curriculum 
improvement.    These  specific  problems  will  be  used  to  stimulate 
discussions  with  "A"  school  personnel,  not  to  assail  them  or  to 
condemn  their  programs. 

The  results  for  the  three  specialties  evaluated  in  this  demonstration 
project  suggest  that  Coast  Guard  "A"  schools  are  doing  an  excellent 
job  of  matching  their  course  content  to  critical  job  requirements. 
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EXPERIMENTAL  EVALUATION 
OF  A 

HIGH  TECHNOLOGY  TRAINING  PROGRAM 


-  Arthur  Kahn 
Systems  Development  Division 
Westinghouse  Electric  Corporation 


The  manufacturing  of  certain  high  technology  products  require  that 
the  individuals  who  perform  the  various  operations  should  be  highly  skilled 
people.     In  order  to  acquire  these  skills  they  are  usually  exposed  to  an 
extensive  training  program.    However,  in  performing  their  part  in  the  manu- 
facturing process,  they  must  do  more  than  manipulate  equipment  and  perform 
highly  skilled  manual  tasks.    They  must  read  drawings,  interpret  these  draw- 
ings, make  a  record  of  their  activity  on  prescribed  forms  at  the  appropriate 
times  during  the  entire  operation  and  evaluate  their  own  performance.  Usually 
at  the  completion  of  the  training  program  each  individual  receives  a  certifi- 
cate indicating  that  he  or  she  has  satisfied  the  performance  requirements'  of 
production  line  and  he  or  she  is  qualified  to  be  a  production  worker. 

At  the  present  time  in  the  Multi  hybrid  assembly  area,  the  method 
for  certifying  that  operators  are  qualified  to  work  on  the  production  line 
after  a  period  of  training  is  the  following:    After  a  given  period  of  train- 
ing the  instructor  starts  the  student  doing  production  work  of  a  relatively 
simple  kind.    After  the  student  has  performed  this  task  for  some  time,  the 
instructor  examines  this  effort.     If  it  is  satisfactory,  it  is  submitted  to 
a  Q.  C.  engineer  for  inspection;    This  engineer  selects  random  samples  of  the 
effort  for  inspection.    If  they  satisfy  his  criteria,  the  individual  who  has 
produced  then  is  certified.    He  or  she  receives  a  card  indicating  that  he  or 
she  has  been  certified.    This  individual  is  now  a  full-fledged  production 
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employee  and  is  expected  to  produce  quality  work  In  a  timely  fashion. 

Over  the  course  of  time  It  had  appeared  that  although  these  Indi- 
viduals were  certified,  the  data  suggested  that  the  Individual  performance 
had  deteriorated  from  the  performance  at  the  conclusion  of  the  training  period. 
Although  the  amount  of  quantitative  Information  for  each  worker  showing  the 
deterioration  was  rather  tenuous,  the  fact  that  the  workers  had  satisfied  the 
Q.  C.  criteria  at  certification  indicated  that  the  operator  could  perform  the 
highly  skilled  tasks  satisfactorily.    Thus,  the  question  became  rather  obvious: 
Why  did  the  performance  deteriorate?    The  deterioration  could  be  caused  by 
several  factors.    The  first  was  poor  attitude  on  the  part  of  the  workers.  The 
second  was  ineffective  training  even  though  they  were  passing  certification. 
The  third  was  Inappropriate  certification  procedures.    The  fourth  was  poor  job 
performance  aids  such  as  drawings.     In  the  period  prior  to  the  formulation  of 
this  study,  the  job  performance  aids  had  been  Improved  by  various  procedures. 
The  certification  procedures  were  those  outlined  in  the  quality  control  docu- 
ments.   These  documents  specify  the  quality  of  performance  required  so  that  it 
would  appear  that  the  certification  standards  are  adequate.    Observation  of 
the  individuals  at  their  work  station  and  discussion  with  supervisors  indicate 
that  the  attitudes  of  the  individuals  are  not  a  problem.    The  individuals 
appear  to  be  paying  attention  to  their  work.    The  analysis  therefore  suggested 
that  the  source  of  the  difficulty  could  either  be  in  the  training  program  or 
the  individuals  selected  for  the  training  program. 

At  first  glance  it  appeared  that  a  test  should  be  designed  to  select 
those  individuals  who  would  benefit  froru  training.    However,  such  a  test  would 
involve  the  difilcult  task  of  validation.    The  task  would  be  a  difficult  one 
because  of  the  perceptual  nature  o.f  the  task  that  the  individual  had  to  per- 
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form.    A  second  question,  ancillary  to  the  first  was:    Could  a  test  be  devised 
by  which  it  might  be  possible  to  weed  out  individuals  who  could  not  profit 
from  further  training?    In  both  of  these  questions  there  was  the  basic  notion 
that  if  you  intended  to  use  the  test  as  a  selection  device,  there  was  the 
problem  of  proving  the  validity  of  the  test.    This  last  item  concerned  is  a 
rather  costly  and  unnecessary  task  under  the  present  circumstance.    It  might 
be  cheaper  to  train  all  the  candidates  and  eliminate  those  who  could  not 
master  the  task.    Available  information  suggest  that  individuals  who  find 
wire  bonding  and  chip  mounting  incompatible  with  their  own  desires  and  own 
evaluation  of  their  capabilities  usually  select  themselves  out  i.e.,  they 
usually  drop  out  of  the  training  program  of  their  own  accord.    Therefore,  there 
is  no  need  for  a  selection  test.    A  third  question  was  could  the  test  be  used 
to  determine  what  and  when  individuals  needed  retaining.    After  analysis  it 
became  obvious  that  these  questions  could  only  be  answered  by  empirical  data. 
However,  it  was  evident  that  the  problem  of  validity  should  not  be  considered. 
Thus,  the  problem  resolved  itself  into  an  evaluation  of  the  training  program. 

This  report  is  the  presentation  of  the  work  that  has  been  performed 
to  answer  these  questions.    An  experiment  had  been  conducted.    Its  aim  was  to 
evaluate  the  effectiveness  of  the  training  program  and  to  determine  if  the 
device  used  for  evaluating  the  effectiveness  of  the  program  could  be  used  as 
a  measuring  tool  to  determine  whether  individuals  had  learned  all  they  need  to 
learn  prior  to  having  their  work  submitted  for  certification. 

This  report  will  cover  the  procedures  used  for  preparing  the  subjects, 
description  of  subjects,  a  description  of  the  method  used  to  assure  anonymity 
of  subjects,  a  description  of  the  substrate,  the  procedure  for  conducting  the 
performance  test,  and  the  scoring  procedure.    These  results  of  the  study  and 
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the  conclusions  that  have  been  derived  are  discussed. 

In  order  for  the  program  to  achieve  its  aim  it  was  essential  that 
the  cooperation  of  all  the  subjects  should  be  enlisted.    Therefore,  an  orien- 
tation meeting  was  held  with  all  of  the  subjects,  the  workers  in  the  Multi- 
hybrid  assembly  area.    At  this  meeting  the  following  material  was  read  to  them: 

"This  program  has  been  prepared  to  evaluate  a  scheme  for  determin- 
ing whether  an  individual  has  obtained  from  the  training  program  all  that  the 
individual  has  been  expected  to  obtain.    In  this  program,  everyone  will  be 
given  a  substrate  that  has  a  serial  number  on  it.    Since  we  will  not  be  having 
everyone  working  on  this  substrate  at  the  same  tiipc,  we  would  appreciate  your 
not  discussing  the  substrate  or  the  work  with  your  co-workers.    We  should  like 
to  have  you  work  to  the  level  A  quality  rules.    We  want  you  to  complete  the 
paperwork  as  required  but  it  will  not  be  necessary  to  put  your  own  initials. 
In  fact,  it  would  be  better  if  you  made  up  initials.    The  important  part  is 
that  the  initials  appear  in  the  correct  number  of  places.    No  one  will  be  able 
to  identify  the  name  of  an  individual  with  the  performance  on  any  substrate. 
This  will  be  accomplished  in  the  following  manner; 

"Each  name  has  a  number  assigned  to  it,  and  I  have  the  list.    We  will 
select  the  first  eight  who  will  do  the  work  shortly,  by  drawing  the  eight  names 
out  of  a  hat.    Each  slip  will  have  a  name  and  number  on  it.    The  individual  who 
is  selected  will  then  put  this  number  in  the  upper  right  hand  corner  of  the 
control  tag.  " 

"After  you  complete  the  work  that  is  required,  the  substrate  will  be 
checked  by  people  selected  by  the  Q.        engineer.    You  will  give  the  completed 
substrate  to  Mrs.  Chavis.(the  instructor)    Mrs.  Chavis  will  give  it  to  the 
 engineer  and  he  will  give  it  to  an  inspector.    The,  inspector  .will  complete  ^a   
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Figure  4.   Substrate  Drawing 


check  sheet  on  which  he  or  she  will  indicate  what  he  or  she  found  acceptabl 
or  not.    These  sheets  will  then  be  given  to  me.     I  will  assign  the  score  values 
and  calculate  total  scores  as  a  percentage  of  the  total." 

After  the  experiment  is  completed,  I  will  tell  all  of  you  who  are 
interested,  the  general  results  in  numerical  terms  without  talking  about  any 
single  individual.     If  the  experiment  is  successful,  we  intend  to  use  the  sub- 
strate to  determine  when  a  trainee  is  ready  for  certification  or  in  the  need 
of  more  training.     It  is  also  planned  to  use  it  to  determine  when  individuals 

are  in  need  of  retraining  and  what  phase  of  the  training  is  involved.  " 

"  If  you  have  any  questions  while  working  on  the  substrate,  please 

ask  Mrs.  Chavis  or  Chuck  Luedtke,  (the  'designer  of  the  experimental  substrate) 

Please  work  as  quickly  as  you  can  but  remember  the  emphasis  should  be  on 

quality." 

"Do  you  have  any  questions?" 
The  subjects  were  experienced  wire  bonder  and  chip  mounters  who  were 
labor  grade  7's.    They  were  all  female.    All  except  two  were  experienced  in 
performing  both  wire  bonders  and  chip  mounters.    One  of  these  was  a  chip 
mounter  and  the  other,  a  wire  bonder.    Although  all  were  experienced  personnel 
at  the  time  the  study  had  begun,  some  were  performing  other  tasks  required  of 
labor  grade  7  other  than  wire  bonding  and  chip  mounting,  e.g.,  lOOX  inspection. 
Figure  1,  2  and  3  show  individuals  performing  the  chip  mounting  and  wire  bond- 
ing tasks.     The  individuals  were  selected  at  random,  until  all  had  completed  the 
test  substrate.    The  only  limiting  factor  to  this  procedure  was  the  number  of 
individuals  who  were  selected  at  the  time.    For  example  in  the  first  group, 
there  were  eight  people  on  the  first  shift  and  two  on  the  second  shift;  in  the 
second  batch  there  were  four  in  the  first  shift  and  two  on  the  second  shift.  ,: 


As  the  remaining  numbers  reduced,  the  number  was  reduced  so  that  at  the  end  of 
the  study  there  were  individuals  working  by  themselves. 

All  subjects  were  assigned  a  number,  the  only  individuals  who  knew 
which  numbers  were  assigned  to  each  individual  were  the  individual;  Mrs.  Chavis, 
who  assigned  the  numbers,  and    the  experimenter.    The  only  individual  who  was 
able  to  associate  a  score  with  an  individual  was  the  experimenter. 
The  control  card  that  the  inspector  examined  contain  only  a  number.     Since  the 
inspector  received  the  substrate  and  paperwork  from  the  instructor.     It  was  not 
possible  for  the  inspector  to  associate  any  substrate  with  any  number  or  name. 
Once  Mrs.  Chavis  gave  the  substrate  to  the  inspector,  :;he  did  not  examine  the 
score  sheet  but  submitted  it  to  the  engineer      who  monitored  the  test  to  assure 
that  the  tantalum  capacitor  (a  particular  component)  had  been  properly  mounted. 
The  inspector's  recording  sheet  contained  only  the  serial  designation  of  the 
substrate. 

The  substrate  used  for  this  study  was  a  special  designed  unit.  It 
contained  resistors,  capacitors,  diodes,    IC  chips  and  tantalum  capacitors.     It  was 
designed  so  that  the  individuals  would  be  required  to  exercise  judgement  to 
determine  the  sequences  of  chip  placement  as  well  as  the  order  of  placing  the 
wire  bonds.     The  chips  were  of  different  sizes.     In  addition  to  the  units  to 
be  assembled  there  were  notes  on  the  drawing  that  had  to  be  followed.    Figure  4 
is  a  drawing  of  the  basic  substrate.    Figure  5  is  a  photograph  of  the  completely 
assembled  device.     Figure  6  is  an  enlarged  photograph  of  a  section  of  the  com- 
pleted device  to  show  the  kinds  of  connections  that  had  to  be  made. 

o 
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Prior  to  the  beginnin;,  of  the  experiment,  a  random  selection  of  the  a 

i 

individuals  who  vere  to  be  «ubjects  in  the  experiment  were  giver  a  number  of 
parts  and  told  to  separate  the  good  units  fram  defective  units.    iLt  this  time, 
they  did  not  know  that  the  units  would  be  used  in  the  experiment.     This  task 
was  accomplished  at  lOOX  magnification.     Those  parts  that  were  considered  good 
by  this  process  became  the  parts  that  were  used  in  the  assembly  of  the  substrate. 

The  subjects  were  provided  with  (1)  a  substrate,   (2)  a  drawing  showing 
how  the  unit  was  co  be  assembled  (fig/  4)  and  (3)  a  package  containing  parts 
selected  following  the  procedures  previously  described.    They  also  received  a 
data  package  that  contained  the  necessary  paperwork  that  normally  accompanies 
a  job.    This  package  contained  a  serialized  inspection  control  tag,  a  continua- 
tion control  tag,  a  serialized  allowable  rework  tag,  and  a  serialized  part 
traceability  tag.    After  receiving  these  materials,  the  individual  proceeded  to 
chip  mount  and  wire  bond  as  required.     They  were  instructed  to  work  until  they  ^ 
completed  the  task.     If  they  had  any  questions  of  any  kind,  they  were  instructed 
to  bring  the  questions  to  the  instructor  .     She  would  either  answer  the  question  or 
provide  whatever  was  required.    The  subjects  were  also  instructed  to  bring  the 
completed  substrate  to    the  instructor.     Since  the  process  required  '^curing'* 
(being  placed  in  a  furnace  for  a  period  of  time)  the  substrate  at  the  appropri- 
ate time,  they  were  instructed  to  bring  the  substrate  to  her  at  the  appropriate 
time.    The  substrate  were  then  returned  to  the  individual  so  that  the  tantalum 
capacitors  could  be  mounted  after  the  ^'curing." 

After  the  individual  had  completed  the  mounting  of  the  tantalum 
capacitors  she  gave  the  completed  substrate  and  associated  paperwork  to  the 
instructor  who  then  gave  the  material  to  the  inspector  who  had  been  assigned 
to  the  program.    This  individual  then  evaluated  the  substrate  in  accordance 


with  level  A  Q,  C-  criteria,  and  recorded  the  result  on  the  evaluation 
sheet  that  had  been  prepared •    Figure  7  is  the  evaluation  sheet  for  the  chip 
mounting  task.    Figure  8  is  the  evaluation  sheet  for  the  wire  bonding  part  of 
the  task.    The  inspector  wrote  the  word  "y^s"  in  the  blank  if  the  criteria 
had  been  satisfied  and  "no"  if  they  had  not  been  satisfied.    After  the  inspector 
had  evaluated  the  substrate  and  had  recorded  the  results  the  substrate  and  paper- 
work was  returned  to  Mrs.  Chavis.     She  gave  the  substrate  and  paperwork  to 
ithe  engineer  who  had  the  resistance  of  the  tantalum  capacitor  mounting 
measured  by  the  test  section  personnel.    A  resistance  of  less  than  .2  ohms  was 
required.    The  result  of  this  test  was  recorded  on  the  evaluation  sheet.  The 
purposes  of  this  test  was  to  determine  whether  the  capacitors  had  been  properly 
( mounted.    The  completed  evaluation  sheets,  paperwork  and  substrate  were  then 
delivered  to  the  scorer  who  proceeded  to  provide  a  numerical  value  to  the  evalu- 
ation that  had  been  accomplished. 

At  the  time  of  development  of  the  evaluation  sheets  for  both  the  chip 
mounting,  each  item  on  the  evaluation  sheet  was  given  a  point  value  by  the 
manufacturing  engineer.     The  assigned  value  depended  upon  his  judgement  of  the 
importance  of  the  task  being  evaluated.    These  points  were  recorded  on  a 
separate  sheet.    They  did  not  appear  on  the  sheet  on  which  the  inspector  recorded 
his  results.    After  each  item  on  the  sheet  had  been  scored  according  to  the 
corresponding  point  value,  the  total  number  of  points  was  obtained  for  chip 
mounting  and  wire  bonding.     If  the  individual  performed  all  tasks  correctly  on 
the  chip  mounting  task,  he  or  she  obtained  132.5  points  and  134.5  points,  on  the 
wire  bonding  task.    The  final  score  obtained  in  each  case  was  the  actual  number 
of  points  obtained  divided  by  the  number  possible  expressed  as  a  percentage. 
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CHIP  MOUNTING  SHEET 


Serial  No.  of  Substrate 


Eutectlc 
Mounting 
Component 


Ql 

Q2 

Q3 

Q4 

Q5 

Q6 

Q7 

CRl 

CR2 

CR3 

CR4 

CR5 

CR6 

Adhesive 
Mounting 
Component 


Orientation  to  7AB  assy  dwg; 

75%  eutectlc  flow 

around  chip,  enough 

room  to  wire  bond  and  to  dep. 


No 

Damage 


(Includes  no  chip  outs,  damage 
metal  eutectic  material  on  top 
chip) 


Epoxy  Visible 
around  75?.  of 
Chip 


Correct 
Epoxy  Used 


Mech, 
Damage 


Excess 
Epoxy 


Wrong  Orientation, 
Location  or 
Mission 


Ul 
U2 
U3 
UA 
U5 
U6 
U7 
U8 
U9 
CRl 
CRA 
C2 

CI,  C3, 
Ql  thru 
QA  thru 
RI  thru 
R/  thru 

RIA  thru  17  

TABS 

Conductive  Adhesive 

Components 

Ul 

U2 

U3 

A  All  lot  entries  made 

B  U8  mounted  per  RN 

C  Dielectric  paste  over  track 


thru  3 
thru  6 


CA 
3 
7 
6 

12 


No  shorts 
epoxy 


by  conductive 


UA 
U5 
U9 

D  Q7 


and  Q6  mounted  to  track  shorting  to 

track    (test  to  verify) 

No  mechanical  damage  to  subs,   


Joint  Resistance 
less  than  2  ohms 


C5 
C6 
C7 
C8 


Satisfy 

Visual  Criteria 


F  Paperwork  correct 
No  Epoxy        No  Damage  to 
Shorting  Assembly 


Indicate 
by: 

Y  -  YES 
N    -  NO 


acceptance 


Figure  7.     Scoring  Sheet 
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WIRE  BONDING  SHEET 


Serial  No.  of  Substrate 
Correct  froio  Drawing 


Component 

U8 
U9 

UI  to  U7 
CR5  &  6 
CR3  &  A 
CRl  &  2 
Q3  &  4 
Q5  &  6 
Q7 

Ql  &  2 

Rl  to  6 

R7  to  12 

RIA  to  17 

CI 

C2 

C3 

CA 

Jumper  1 
Jumper  2 
TABS 


Score  Bonded  Correctly 


(each) 


General 

Wired  so  tantulums  can  be  mounted 

Wire  loops  uniform  appearance 

Rl  and  RA  not  bonded 

Wires  not  mashed 

Rll  bonded  from  a  center  tap 

No  mechanical  damage  to  chips  or  substrate 

Penalties 

One  pigtail  or  spur  not  removed 
Two  pigtails  or  spurs  not  removed 
Three  pigtails  or  spurs  not  removed 
Four  or  more  pigtails  or  spurs  not 
removed 


Ball  Bond  Pla 


More  than  75%  of 
ball  on  pad 


Ball  infilet 


Ball 

bond  short 


Sliding 
Bonds 


Ul 
U2 
U3 
U4 
U5 
U6 
U7 
U8 
U9 

Ql 

Q2 
Q3 
Q4 
Q5 
Q6 
Q7 

RIA  to  17 
Rl  to  12 
CI,  C3  &  CA 
C2 
CRl 


&  2 


CR3  &  4 

CR5  &  6  I 
* 

ponded  Correctly 
Wire  Placement 
Ball  Configuration 
Stitch  Configuration 


Wire  Length 
Wire  Damage 


Indicate  acceptance  by: 


YES 
NO 


Figure  8.    Scoring  Sheet 
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The  results  of  the  study  are  the  percentage  of  total  points  that 
were  obtained  by  the  individuals  in  the  wire  bonding  and  chip  mounting  tasks. 
Table  I  shows  the  mean  and  standard  deviation  for  the  different  tasks  and  the 
average  of  both  tasks. 

Table  I 

Average  Performance  Measures  and  Their  Variability 
(Percent  of  Total  Points) 

Mean  S.D. 
Chip  Mounting  87.33  5.46 

Wire  Bonding  85.33  7.19 

Average  86.33  5.53 

The  mediam  score  was  87.57o. 

JBach  score  sheet  was  examined  to  see  if  a  pattern  of  errors  could  be 
extracted  from  the  results.    The  analysis  of  the  wire  bonding  task  showed  that 
88%  of  the  individuals  did  not  bond  Rll  correctly.     42%  did  not  have  wire  loops 
with  uniform  appearance;  33%  did  not  remove  one  pigtail;  37%  were  criticized 
for  having  mashed  wires.     These  were  Q.  C.  criteria  that  they  had  to  keep  in 
mind.    A  similar  analysis  was  made  of  the  chip  mounting  task.    There  were  two 
different  mounting  methods  required.    The  error  rate  of  Eutectic  mounting  (an 
alloy  forming  technique)  was  approximately  20%  while  the  error  rate  of  adhesive 
mounting  was  1%.     88%  of  the  people  did  not  complete  the  paperwork  correctly. 
This  value  represented  21  of  the  24  individuals.     967o  did  not  mount  U6  correctly 
as  required  by  the  RN.  (Engineering  Change)    Almost  50%  of  the  people  did  not 
use  the  correct  epoxy  in  mounting  U6. 

Although  the  inspector  found  many  instances  of  mechanical  damage 
the  error  rate  was  relatively  small. 

The  average  score  of  86+  indicates  that  as  a  group,  all  of  the  indi- 
viduals performed  ver^  well.    When  consideration  is  given  to  the  fact  that  a 

majority  of  individuals  lost  points  because  the  paperwork  was  not  correct  and/or 
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they  did  not  comply  with  the  RN,  the  average  would  have  been  greater  than  90. 
This  analysis  suggests  that  the  individuals  were  capable  of  performing  the 
chip  mounting  and  wire  bonding  tasks  very  well.    The  major  problems  other  than 
the  RN  in  the  wire  bonding  task  is  related  to  meeting  Q.  C.  criteria.    42%  did 
not  meet  the  uniform  appearance  criteria  and  37%  were  criticized  for  having 
mashed  problems  and  mechanical  damage.     Wires  can  get  mashed 

during  the  movement  of  the  substrate  and/or  the  movement  of  the  capillary  with 
the  wire.    At  the  same  time  it  is  possible  for  chips  to  be  damaged.    One  of  the 
inspector's  tasks  was  to  discover  mechanical  damage.    An  e?f^mination  of  the 
inspector's  finding  and  the  substrates  indicate  that  it  is  possible  that  some 
of  the  points  lost  because  of  mechanical  dairagc:  were  due  to  defective  parts 
in  the  kits  even  though  all  bad  units  were  supposed  to  have  been  removed. 
These  findings  suggest  that  the  individuals  who  made  the  determination  were 
not  sufficiently  aware  of  the  criteria  for  rejecting  defective  components. 
That  this  state  of  affairs  should  exist  is  not  unexpected  since  these  indi- 
viduals have  not  had  any  organized  training  in  the  recognition  of  defective 
units  at  lOOX.     It  would  therefore  appear  that  training  in  the  recognition 
of  defective  material  should  become  patt  of  the  training  of  these  individuals. 


given  to  the  correct  accomplishment  of  the  necessary  paperwork,  the  careful 
reading  of  drawings,  calling  attention  to  the  information  in  the  notes  and  the 
assuring  that  the  proper  drawings  and/or  revision  notices  are  available.  It 
is  the  responsibility  of  the  individual  worker  to  determine  whether  the  paper- 
work indicates  that  the  proper  drawings  are  at  hand.     In  this  experiment,  very 
few  individuals  recognized  that  an  RN  was  required.    It  has  been  argued  that 
the  individuals  had  been  instructed  not  to  ask  questions.    They  were  In  fact 


The  results  suggest  that  more  attention  during  training  should  be 
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told  not  to  ask  their  associates  but  they  were  encouraged  to  ask  Mrs.  Chavis 
If  something  were  amiss. 

In  any  event,  the  importance  of  training  effort  on  paperwork  became 
evident  in  the  course  of  an  extraneous  exercise  that  occurred  independent  of 
the  experiment-    Mrs.  Chavis  was  instructing  an  individual  while  the  experi- 
ment was  being  conducted-    During  this  period,  the  individual  completed  the 
instruction  and  completed  the  certification  process  satisfactorily.    Mrs.  Chavis 
then  gave  this  individual  the  test  substrate  as  an  experiment  as  an  innovative 
part  of  her  training  program.    At  the  completion  of  the  substrate  she  asked  the 
individual  what  she  had  learned.    The  individual  responded  with  the  following 
remarks:     (1)    "Arranginr^  and  locating  the  chip  on  the  plain  substrate  without 
metallization.     (2)  "Using  your  footnotes  and  flags."  (3)  "Making  your  revision." 
(4)     "Understanding  y  our  paperwork .  "    (5)     '^Listing  lot  numbers .  "    (6)  "Check- 
ing orientation  when  mounting  transistors."    This  ad  hoc  experiment  suggesting 
that  the  finding  of  deficient  paperwork  in  the  results  is  not  an  artifact  is  but 
a  true  representation  of  the  performance  of  the  individuals.     It  suggests  that 

-either  the  individuals  are  not  aware  of  what  is  correct  or  they  have  been   

inattentive  to  this  aspect  of  the  task.     Since  so  many  individuals  made  paper- 
work errors  it  does  not  seem  reasonable  to  attribute  the  errors  to  inattention. 

This  extraneous  "experiment"  also  shows  that  the  beginning  chip  mounter 
needs  training  in  arranging  chips  when  the  substrate  does  not  provide  a  metallized 
tab  on  which  to  mount  it. 

Earlier  paragraphs  have  indicated  that  individuals  have  had  little 
formal  training  on  the  recognition  of  defective  parts  at  lOOX.  Similarly, 
they  receive  no  training  in  the  general  examination  at  lOOX.     It  would  seem  unreas 
able  to  have  individuals  perform  work  at  30X  then  have  it  examined  at  lOOX  and 


then  criticize    the  individuals  for  not  performing  good  work  when  they  are 
unable  to  tell  good  from  bad  at  30X  because  the  defect  only  becomes  visible  at 
lOOX, 

The  results  also  show  that  there  is  little  difference  in  performance 
between  chip  mounting  and  wire  bonding,  the  basic  task  of  these  individuals* 
The  difference  slightly  less  than  two  two  percentage  points.    However,  the 
error  rate  for  Eutectic  mounting  was  about  20  times  that  of  the  adhesive  mount- 
ing.   This  difference  could  have  been  due  to  the  fact  that  the  chips  used  were 
training  chips*    However,  similar  chips  Wcrs  used  in  both  procsssss*     Xt  is 
conceivable  that  the  nature  of  the  Eutectic  mounting  is  such  that  the  judge- 
ments the  individuals  have  to  make  and  the  variability  that  could  occur  in  the 
equipment  as  contrasted  to  that  which  could  occur  in  adhesive  mounting  could 
account  for  this  difference.    The  data  show  that  96%  did  not  use  the  correct 
epoxy  to  mount  U6.    This  error  was  a  result  of  not  requesting  the  BIN. 

The  analysis  thus  far  suggests  that  the  training  program  should  be 
changed  in  three  fundamental  ways.     The  first  fundamental  change  should  be 
more  emphasis  on  correct  examination  of  the  paperwork  and  emphasis  on  the 
correct  manner  of  completing  all  the  paperwork  correctly.    There  is  no  reason 
why  the  individual  who  can  perform  the  complex  task  of  wire  bonding  and  chip 
mounting  should  make  mistakes  on  paperwork.    The  individuals  during  training 
should  be  given  practice  completing  the  paperwork  during  separate  exercises 
and  then  this  kind  of  practice  should  be  integrated  into  their  work  completing 
substrates.    The  second  fundamental  way  the  training  should  be  changed  is  that 
additional  training  substrates  should  be  developed  and  used.    At  least  two 
should  be  developed,  one  more  difficult  than  the  other.     These  should  include 
the  -use^of  RN '  s  and'^ the  correct  completion  of  paperwork-.  -The  third  way  is — - — 


that  the  individuals  should  be  given  more  practice  in  discriminating  good  from 
bad  work.    The  final  significant  item  of  the  results  was  the  fact  that  47%  of 
the  individuals  failed  the  test  of  the  tantalum  capacitors.    Although  the  units 
passed  the  visual  inspection,  they  did  not  pass  the  electrical  test.    This  test 
demonstrated  that  the  mounting  of  the  capacitor  was  sufficiently  complete  so 
that  there  was  little  electrical  resistance  in  the  bonding  materials.  This 
result  suggests  that  a  way  must  be  found  to  allow  the  operators  to  be  able  to 
evaluate  the  quality  of  the  eledtrical  connection  by  the  amount  and  distribu- 
tion of  epoxy. 

From  the  learning  theory  point  of  view,  tho  results  indicate  that  it 
is  not  sufficient  to  give  the  individual  practice  in  the  task  that  must  be 
performed  but  the  individual  must  gain  experience  of  performing  this  task  in 
a  different  context.    This  means  that  the  individual  learns  to  vary  the  basic 
task  as  the  function  of  different  requirements  from  trial  to  trial.  This 
state  of  affairs  existed  when  Mrs.  Chavis,  the  instructor,  used  the  test  sub- 
strate during  her  training  program  with  new  employees.     It  could  therefore 
appear  that  the  individuals  need  to  get  practice  in  the  basic  task  but  they 
must  develop  a  large  repertoire  of  responses  before  they  can  be  successful 
production  workers. 

This  particular  experimental  program  demonstrated  that  a  properly 
developed  testing  program  will  not  only  indicate  where  training  programs  can 
be  improved  but  they  provide  a  mechanism  for  applying  what  is  known  about 
changing  human  behavior. 

■  >  o 
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SECTION  14 

INSTRUCTIONAL  EVALUATION  AND  TEST  DEVELOPMENT 
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THE  INSTRUCTIONAL  QUALITY  INVENTORY:    INTRODUCTION  AND  OVERVIEW 


John  A.  Ellis,  Wallace  H.  Wulfeck  II, 
Navy  Personnel  Research  and  Development  Center 

Robert  E.  Richards,  Norman  D.  Wood 
The  Pennsylvania  State  University 

and 

M.  David  Merrill 
Courseware,  Inc.,  San  Diego,  Ca. 


INTRODUCTION 

Problem 

Modern  military  instruction  is  developed  according  to  a  systematic 
method  called  Instructional  Systems  Development  (ISD).    The  order  of 
development  involves: 

1.  Job/tasic  analysis  leading  to  specification  of  training  objectives; 

2.  Development  of  tests  to  measure  student  progress  toward  the 
objectives; 

3.  Design  of  new  instruction  and/or  adaptation  of  existing  instruction 
to  achieve  the  objectives; 

.4.    Implementation  of  the  training  program; 
5.    Evaluation  and  feedback  for  course  maintenance. 

Various  military  activities  are  using  this  model  to  develop  many 
of  their  training  courses.    There  is  a  need  in  ISD  for  quality  control 

and/or  evaluation  procedures  so  that  (a)  quality  can  be  maintained  

throughout  instructional  design  so  that  errors  early  in  development  are 
not  magnified  as  development  proceeds,    (b)  existing  materials  can  be 
evaluated  with  respect  to  newly  derived  training  objectives  for  purposes 
of  adaptation  or  revision,  (c)  deficiencies  in  performance  of  course 
graduates  can  be  traced  to  possible  deficiencies  in  instructional  materials, 
and  (d)  instructional  materials  obtained  through  contract  efforts  can  be 
evaluated  for  purposes  of  acceptance. 

Purpose 

The  purpose  of  this  research  and  development  effort  was  to  develop 
quality  control /evaluation  procedures,  for  use  by  military  instructional 
design  and  development  personnel,  for  the  three  main  products  of  an 
instructional  development  effort,  namely  objectives,  tests,  and 
instructional  materials  or  presentations.   The  Instructional  Quality 
Inventory  (IQI)  is  the  result  of  this  effort. 
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INTRODUCTION 

Problem 

Modern  military  instruction  is  developed  a 
method  called  Instructional  Systems  Development 
development  involves: 

1.  Job/task  analysis  leading  to  specifica 

2.  Development  of  tests  to  measure  studen 
objectives; 

3.  Design  of  new  instruction  and/or  adapt 
to  achieve  the  objectives; 

.4.    Implementation  of  the  training  prograir 


Using  the  analysis  procedures  of  the  IQI  to  rate  the  consistency  and 
adequacy  of  an  instructional  program,  and  making  revisions  on  the  basis 
of  these  analyses,  can  greatly  reduce  the  time  and  effort  needed  to  validate 
and  revise  an  instructional  course  or  system.    However,  although  the  IQI 
can  reduce  the  need  for  validation  on  real  students,  it  does  not  entirely 
eliminate  the  need  for  empirical  tryouts. 

The  IQI  is  a  method  for  :'/roduct  evaluation,  not  process  evaluation. 
Regardless  of  the  development  methodology  used  to  produce  the  objectives, 
tests,  or  instruction,  the  IQI  can  be  used  to  evaluate  the  quality  of 
the  products.    The  IQI  criteria  can  be  kept  in  mind  during  the  development 
of  instruction,  but  the  IQI  is  intended  as  a  supplement  to  ISD,  not  a 
replacement  for  it. 

The  IQI  is  intended  for  use  by  people  with  knowledge  of  ISD;  it 
cannot  be  used  by  untrained  personnel.    Also,  the  application  of  the  IQI 
depends  upon  a  good  task  analysis,  or  the  availability  of  subject-matter 
experts,  and  preferably  both.    This  is  because  the  IQI  assumes  that  what 
needs  to  be  taught  has  already  been  determined. 

Organization  of  this  paper 

The  following  section  of  this  paper  is  an  introduction  to  the  IQI 
procedures.    It  is  designed  to  acquaint  managers  of  instructional 
development  efforts,  evaluators  of  instruction,  contract  monitors,  etc., 
with  the  IQI.    While  it  provides  a  substantive  overview  of  the  IQI 
process,  it  is  not  a  complete  IQI  training  program. 

There  are  three  other  volumes  in  the  IQI  series  which  will  be 
available  in  early  1979.    These  are 

1.  a  User's  Manual,  which  contains  a  complete  description  of 
all  IQI  procedures,  and  examples  of  their  use. 

2.  a  Training  Workbook,  which  contains  additional  examples  and 
practice  on  the  IQI  procedures. 

3.  a  Job  Performance  Aid,  which  contains  a  brief  version  of  each 
IQI  procedure. 
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INSTRUCTIONAL  QUALITY  INVENTORY  PROCEDURES 


THE  CLASSIFICATION  SYSTEM: 

The  follcuDing  olaBeifioation  Byatem  ie  used  in  all  IQI  proaedurea. 
It  ia  applied  to  the  three  main  parte  of  instruction:  objeativeBf 
tests,  and  instructional  presentations. 

Each  objective,  test  Item,  or  piece  of  presentation,  can  be  classified 
according  to: 

1.  What  the  student  must  do,  i.e.,  the  TASK  to  be  performed,  and 

2.  The  type  of  information  the  student  must  learn.  I.e.,  the 
instructional  CONTENT. 


CONTENT 


TASK 


In  the  IQI,  these  two  classification  dimensions  have  been  combined 
to  form  the  TASK/CONTENT  MATRIX. 
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THE  TASK  dimension: 

There  are  two  main  TASKS  a  student  can  perform: 

1.  He  can  REMEMBER  information,  or 

2.  He  can  USE  the  information  to  do  scctfithing. 


REMEMBER 
USE 


EXAMPLE: 

Here  are  two  test  items: 

1.  The  symbol  for  resistor  is  . 

2.  Using  your  knowledge  of  electronic  theory,  what  would  happen 
in  the  circuit  shown  below  if  the  load  resistance  were 
shorted? 


fheee  two  test  i    ns  differ  with  rcDpect  to  wltat  the  student  is 
supposed  to  do  (TASK).    In  number  I,  the  student  has  to  REMEMBER 
something^  and  in  number  P.,  the  student  has  to  apply  or  USE  his 
knowledge  in  a  new  situation. 


1 1  L'D 
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THE  CONTENT  DIMENSION: 


There  are  five  types  of  CONTENT: 


FACT        CONCEPT        PROCEDURE        RULE  PRINCIPLE 


remember 


use 


FACTS  are  simple  associations  between  names,  objects,  symbols, 
locations,  etc. 

CONCEPTS  are  categories  or  classifications  defined  by  certain 
specified  characteristics. 

PROCEDURES  consist  of  ordered  sequences  of  steps  or  operations 
performed  ""on  a  single  object  or  in  a  specific  situation. 

RULES  also  consist  of  ordered  sequences  of  operations,  but  can 

be  performed  on  a  variety  of  objects  or  in  a  variety  of  situations. 

PRINCIPLES  involve  explanations  or  predictions  of  why  things 
happen  In  the  world.    That  is,  they  concern  predictions  or 
Interpretations  based  on  theoretical  or  cause-effect  relationships. 

NOTE:    Facta  can  only  be  remembered.    The  others  can  be 
remembered  or  used. 


EXAMPLES: 


The  following  examples  illuatvate  the  five  content  areas  for  the 
REfCMBER  task  level: 


REMEMBER  FACT 


REMEMBER  CONCEPT 


REMEMBER  PROCEDURE 


REMEMBER  RULE 


I. 
2. 


I. 
2. 


2. 


I. 


2. 


REMEMBER  PRINCIPLE  I. 


2. 


The  symbol  for  resistor  is  . 

The  student  will  list  the  names  of  the 
parts  in  the  wind  indicating  instrument. 

List  the  defining  characteristics  of  a  Jet  pump. 
The  student  will  define  the  various  kinds  of 
clouds  (awrrulus^  stratus^  etc.). 

List  in  order  the  steps  for  cleaning  an  M-16  rifle. 
The  student  will  describe  the  procedure  for 
preparing  and  sending  a  radio  message. 

List  the  steps  involved  in  finding  the  rhumb^ 
line  course  between  two  points  on  the  earth. 
The  student  will  state  the  general  rule  for 
solving  for  circuit  current,  given  voltage 
and  resistance. 

State  the  principles  of  electron  movement  in  a 
semiconductor  junction. 

The  student  will  recall  the  reasons  why  hydraulic 
fluid  contamination  must  be  avoided. 


Facts  can  only  be  remembered,  but  for  the  other  content  types,  the 
etudent  may  be  asked  to  USE  his  knowledge  to  classify,  perform,  ^solve,^ 
or  predict.    The  following  are  examples  of  the  USE  task  level  for 
all  content  types  except  facts: 


USE  CONCEPT 

USE  PROCEDURE 
USE  RULE 


USE  PRINCIPLE 


I. 
2. 


I. 
2. 

I. 

2. 


2. 


Which  of  the  pumps  aboard  ship  are  jet  pumps? 
Given  photographs  of  clouds,  the  student  will 
sort  them  according  to  type  (cumulus,  stratus,  etc.). 

Clean  an  M-16  rifle. 

The  student  will  prepare  and  send  a  radio  message. 


Calculate  the  rhumb-line  course  from  Pearl 
Harbor  to  Long  Beach. 

Given  the  values  for  voltage  and  resistance, 
student  will  calculate  the  current  floWr 


the 


Describe  the  theoretical  movement  of  electrons  in 
a  PNP  transistor. 

The  student  will  predict  what  is  likely  to  occidr 
if  the  landing  gear  fluid  were  contaminated. 


EKLC 
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THE  USE  LEVEL  CAN  BE  FURTHER  DIVIDED  INTO  TWO  TYPES: 


!•    USE-UNAIDED  in  which  the  student  has  no  aids  except  his  own  memory. 
2.    USE-AIDED  in  which  the  student  has  a  job  aid  for  performing  the  task. 

Fov  thia  level j  the  nature  of  the  aid  depends  on  the  content  type: 

For  USE-AIDED  CONCEPTS  the  aid  should  aoneiet  of  a  deoision 
etrategyj  including  each  critical  characteristic j  and  the  decision 
to  be  made  according  to  presence  or  absence  of  that  characteristic. 
In  simple  caseSj  the  aid  may  only  include  a  list  of  characteristics s 
the  decision  strategy  is  then  implied. 

For  USE-AIDED  PROCEDURES  the  aid  would  be  a  list  of  steps  to  be 
performed. 

For  USE-AIDED  RULES  the  aid  would  be  at  least  a  statement  of  the 
formula  or  rule  to  be  applied^  and  could  include  guidelines  for 
when  and  how  to  .apply  it. 

For  USE-AIDED  PRINCIPLES  the  aid  would  also  be  at  least  a  statement 
of  the  principle^  and  could  include  guidelines  for  when  and  how 
to  apply  it. 


EXAMPLES: 


USE-AIDED: 


A  pilot's  preflight  checklist  is  a  USE-AIDED  

procedure.    The  pilot  does  not  have  to  remember  the 
steps  or  their  order  because  they  are  on  the  check- 
list.   The  pilot  does  need  to  perform  the  steps 
correctly. 


USE-UNAIDED: 


''The  student  will  field-strip  an  M-IB  rifle.  " 
Here J  the  student  must  remember  the  steps  in  the 
correct  order^  and  perform  them  correctly* 


In  summary,  the  REMEMBER  level  involves  "pure"  remembering, 


the  USE-UNAIDED  level  involves  remembering  what  is  to  be  used, 
and  then  using  it,  and 

the  USE-AIDED  level  involves  "pure"  using. 
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THE  ENTIRE  TASK  /  CONTENT  MATRIX  is  shown  below: 


CONCEPT  PROCEDURE  RULE  PRINCIPLE 

REMEMBER  SEQUENCE  OF  REMEMBER  OR        REMEMBER.  QR 

FACT  CHARACTERISE  STEPS  REMEM-  USE  A  SEQUENCE  INTERPRET  / 

RECALL  OR  TICS,  OR  BERED  OR  USED  OF  STEPS  WHICH  PREDICT,  WHY 

RECOGNIZE  CLASSIFY  OB-  IN  A  SINGLE  APPLY  ACCROSS    OR  HOW  THINGS 

NAMES,  JECTS,  EVENTS  SITUATION  OR  SITUATIONS         HAPPEN,  OR 

PARTS,  OR  IDEAS  AC^  ON  A  SINGLE  OR  ACROSS  CAUSE-EFFECT 

DATES.  CORDING  TO  PIECE  OF  EQUIPMENTS  RELATIONSHIPS 

PLACES,  ETC.        CHARACTERISTICS  EQUIPMENT 


RECOGNIZE  FACfsT^CON- 
CEPT  DEFINITIONS,  STEPS 
OF  PROCEDURES  OR  RULES, 
STATEMENTS  OF  PRINCIPLE 

I 

USE-UNAIDED  -  tasks  which  require 

CLASSIFYING,  PERFORMING  A  PROCEDURE, 
USING  A  RULE>  EXPLAINING  OR  PREDICTING 
WITH  NO  AIDS  EXCEPT  MEMORY. 

USE-AIDED  -  same  as  use-unaided, 

EXCEPT  JOB  AIDS  ARE  AVAILABLE. 

i 

Any  objective,  test 
classifiable  in  one 

This  matrix  is  used 


item,  or  piece  of 
and  only  one  cell 

in  all  IQI  steps. 


instruction  will  be 
of  the  matrix  above - 
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OBJECTIVE  ADEQUACY: 


The  first  step  in  the  IQI  procedure  aorreaponda  with  the  development 
of  training  objectives .    The  procedure  deacribed  below  ia  uaed  to 
deterrrrine  if  each  objective  ia  adequate  for  further  instructional 
development. 

Objectives  are  ADEQUATE  if  they  satisfy  three  general  criteria: 


1. 


Is  the  objective  CORRECTLY  STATED?    does  th?.  objective  include^  statements 
of  actions  the  student  is  to  perform  after  training^  the  conditions  under 
which  the  performance  is  expected^  and  the  standards  which  the  perfomance 
must  meet?    If  even  one  of  these  parts  is  missing^  the  objective  is 
inadequate  because  training  for  it  cannot  be  designed  or  evaluated. 

EXAMPLE:    Inadequate  objective:    "The  student  will  prepare  a  standard 

Navy  message."    This  is  inadequate  because  it  does  not  specify 
either  the  conditions  (given  a  typewriter?    TTY?)  or  the 
standards  (how  fast  and  how  many  errors). 

Is  the  objective  CLASSIFIABLE  on  the  task/content  matrix?   If  the  ob- 
jective cannot  be  classified^  this  means  that  the  action  the  student 
is  to  perform  is  not  stated  clearly  enough  so  that  we  know  what  the 
student  is  to  do.    Training  cannot  be  designed  or  evaluated  in  these 
circumstances. 

EXAMPLE:    The  objective  "The  student  will  learn  repair  procedures  for 
the  XYZ  radar  set"  is  not  classifiable.    It  is  not  clear 
whether  the  student  should  remember  the  procedures  or  actually 
use  them. 

Is  the  "intent"  of  the  objective  APPROPRIATE  for  the  purpose  of  the 
course?    The  actions^  conditions^  and  standards  specified  in  the 
objective  should  be  as  clokse  as  possible  to  the  actions^  conditions^ 
and  standards  of  the  task  to  be  performed  on  the  job  after  training. 

In  addition^  it  is  assumed  that  the  ultimate  "intent"  of  any 
training  program  is  to  teach  the  student  how  to  do^  something  (i.e.  USE 
level).    There forey  there  must  be  a  USE-level  objective  for  each  REMEMBER 
objective.     (Facts  are  a  special  case:    Although  facts  are  not  usedj  they 
often  must  be  taught  to  provide  a  knowledge  base  for  a  later  use-^level 
task.    Therefore^  in  order  to  justify  teaching  facts^  they  must  support 
some  use-level  objective. ) 

Conversely^  USE-UMIDED  tasks  should  be  taught  at  the  REMEMBER 
level  before  being  taug/ht  at  the  USE  level.    Therefore^  just  as  every 
REMEMBER  objective  should  have  a  corresponding  USE  objective^  every  USE 
objective  should  have  a  previous  REMEMBER  objective. 

EXAMPLE:    The  objective  "The  student  will  identify  the  connection  of  a 

voltmeter  to  measure  the  voltage  across  a  component  by  selecting 
an  illustration''  is  not  appropriate  for  the  intent  of  the  course. 
The  student  will  not  see  possible  illustrations  of  connections 
on  the  job J  but  will  be  required  to  set  up  the  connection^  thus 
the  action  should  be  revised.  ~^ 
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TEST  CONSISTENCY  and  ADEQUACY: 


Onae  objeativee  are  adequate^  teat  items  can  be  developed.    The  next 
IQI  step  ia  the  quality  control  step  for  test  development.    This  step 
involves  determining  whether  test  items  are  CONSISTENT  with  objectives^ 
and  whether  each  item  is  ADEQUATE. 

A  test  Item  is  CONSISTENT  with  its  objective  if: 

1.  The  ACTION  (TASK/CONTENT  level)  of  the  test  item  is  the  same  as  that 
of  the  objective. 

2.  The  CONDITIONS  under  which  the  item  is  administered  are  as  close  as 
possible  to  those  of  the  objective. 

3.  The  STANDARDS  in  the  test  item,  or  the  STANDARDS  for  scoring  the  item, 
are  as  close  as  possible  to  the  standards  in  the  objective. 


EXAMPLE:    Objective:    Given  the  necessary  tools  and  arn  operator's 

manual,  the  atudent  will  set  up  and  operate 
a  double-acting  reciprocating  pump^  in  five 
minutes  and  according  to  the  manual  specifications. 

Inconsistent  test  item:    '^List  the  steps  of  procedure  for 
starting^  operating^  and  stopping  a  double- 
acting  reciprocating  pump.  " 

This  test  item  is  inconsistent^  because 
its  TASK/CONTENT  is  EEMEMBER-PROCEDURE 
instead  of  USE-AIDED-PROCEDURE.  Notice 
that  the  action  the  student  ia  to  perform 
in  the  teat  ia  not  the  aame  aa  the  action 
required  in  the  objective. 

Conaiatent  teat  item:    'Vae  the  operator 'a  manual^  and 

neceaaary  toola  to  aet  up  and  operate  a  double- 
acting  reciprocating  pump.    You  will  paaa  thia 
teat  if  you  complete  thia  taak  within  5  minuteaj 
in  accordance  with  the  manual  apecifications.  " 

Thia  teat  item  ia  conaiatent  with  the 
objective.    Notice^  however ^  that  if 
either  the  nonditiona  or  grading  atandarda 
had  been  left  out,  the  item  would  have 
been  inconaiatent. 
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A  test  item  may  be  consistent  with  its  objeativej  but  may  still  not 
be  an  adequate  item.  TEST  ADEQUACY  depends  on  a  number  of  criteria 
that  items  must  satisfy. 

After  a  test  item  is  consistent  with  its  objective,  the  test  item  is 
ADEQUATE  if: 

1.  The  item  is  clear  and  unambiguous. 

2.  The  item  does  not  give  away  its  own  answer  or  the  answer  to  any 
other  item  on  the  test. 

3.  The  format  of  the  test  item  is  appropriate  for  the  TASK/CONTENT 
level. 

4.  Other  adequacy  concerns  covered  in  the  IQI  manual  are  met. 

EXAMPLES:  1.     "f/hich  of  the  following. "  is  ambiguous 

because  it  does  not  say  '^choose  all  that  apply" 
or  "choose  the  best. . . 

2.     "The  steps  in  the  procedure  for  operating  a 
jet  pump  are  listed  below.    Arrange  them  in 
the  correct  order.  "   This  is  an  innapropriate 
format  for  REMEMBER-FROCEDURE  because  the 
student  doesn't  have  to  remember  the  procedure^ 
only  recognize  it.  ' 


Note:    Recognition  items  (multiple  choice^  matching^  true-false)  are 
usually  NOT  appropriate  test  formats  for  REMEMBER-level  objectives. 
This  is  because  these  items  do  not  reflect  typical  job-performance 
requirements. 

Multiple-choicej  matching^  and  true-false  items  are  appropriate  ^ 
for  concept  recognition^  and  can  be  appropriate  for  USE  level  objectives 
if  they  are  carefully  designed.    However^  for  USE  level  objectives^ 
"hands-on"  performance  tests  are  usually  most  appropriate. 
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PRESENTATION  CONSISTENCY; 


At  this  point  in  the  IQI  process^  objectives  are  adequate^  test  items 
are  consistent  with  objectives^  and  test  items  are  adequate.  The 
next  instructional  design  step  is  to  prepare  the  instructional  materials 
or  presentations.    The  next  IQI  step  is  to  insure  that  the  presentations 
are  CONSISTENT  with  the  objectives  and  test  items ,  and  are  ADEQUATE. 


In  the  previous  aeation^  determinirig  test-objeotive  consistency  involved 
comparing  each  test  item  with  its  related  objective.  Determining 
PRESENTATION  CONSISTENCY,  on  the  other  hand^  involves  checking  whether 
or  not  each  of  the  INSTRUCTIONAL  COMPONENTS  required  for  a  given 
objective-test  item  is  present.    There  are  different  types  of  instruc- 
tional components.     In  order  to  insure  consistency,  the  appropriate 
components  must  be  present  for  each  TABK/CONTENT  level.    Not  all 
task/content  levels  require  all  components. 

The  Instructional  PRESENTATION  COMPONENTS  are: 

1«    STATEMENT:    The  instruction  tells  the  student  something  he  must  learn. 

Z.    EXAMPLES:    The  instruction  shows  the  USE  of  content  (concept,  procedure, 
rule,  or  principle) . 

3,    PRACTICE:    The  student  practices  REMEMBERING  or  USING  the  content^  and 
is  given  feedback. 
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PRESENTATION  COMPONENTS: 

STATE>gNT  Component:    The  instruction  presents  a  statement  of  a  fact, 

a  concept  definition,  the  steps  of  a  procedure  or 
rule,  or  a  statement  of  a  principle. 

EXAMPLES:    1.  "The  aharaater^iBtiaB  of  a  typical  jet  pump  include. ...  " 
(aonaept  definition) . 

2i  "The  pTocedupe  fov  dhtxtiging  a  gask^-t  in  a  ^heck  vaZve  ie  ..." 
(procedure  definition) . 

3.  "To  determine  voltage^  multiply  current  by  reaiatccnce.  " 
(rule  statement). 

EXAMPLE  Component:    The  student  is  told  or  shown  how  a  statement  of  a 

concept,  procedure,  rule,  or  principle  applies  in  a 
specific  case* 

EXAMPLES:  1.  "The  XXYZ  pump  is  a  double-acting  reciprocating  pump 
because  it  has  the  particular  charaatei-istica  noted 
on  the  diagram  below."    (concept  example). 

2.  "Let  ^8  see  how  OHM'S  LAW  applies  in  a  specific  case*..." 
(rule  example). 

3.  "The  Navy's  victory  at  Midway  in  World  War  II  illustrates 
the  value  of  cryptologic  intelligence  because..." 
(principle  example). 

PRACTICE  REMEMBERING  Component:    The  student  is  asked  to  supply  part  or  all 

of  a  fact  statement,  concept  definition,  the  steps  of 
of  a  procedure  or  rule,  or  the  statement  of  a  principle. 
The  student  is  given  FEEDBACK  about  the  correctness  of 
his  answer. 

EXAMPLES:    2.     "The  father  of  our  country  is   ?"  (Fact) 

2.     "List  in  order  the  steps  of  procedure  for  ...."  (Procedure) 

PRACTICE  USING  Component:    The  student  is  asked  to  use  a  concept  definition, 

procedure,  rule,  or  principle  on  a  specific  case 
to  which  it  applies,  and  is  given  FEEDBACK  about 
the  quality  of  his  performance. 

EXAMPLES:    1.  "Classify  the  following  Lofargrams."  (concept) 

2.  "Using  the  procedure  in  the  tech.  manualj  dissassenible 
the  . . . ."  (procedure) 

3.  "Solve  the  following  circuit  problems. ... "  (rule) 

4.  "Predict  the  effect  (sociological  and  psychological)  when 
women  are  assigned  to  Navy  ships."    (principle) . 


1151  l2(:s 


I 


For  CONSISTENCY,  different  components  are  required  for  different  task  levels; 


For  the  REMEMBER  level: 


For  the  USE-UNAIDED  level 


a  STATEMENT     (no  example) 


a  STATEMENT 

(or  a  review 
of  the  state- 


For  the  USE-AIDED  level: 


(The  aid 
takes  the 
place  of  the 
statement. ) 


EXAMPLES 

(at  least 
one) . 


EXAMPLES 
WITH  AID. 


PRACTICE 
REMEMBERING. 

PRACTICE 
USING. 


PRACTICE 
USING  WITH 
AID. 


These  required  oomponents  apply  across  all  content  types  (faats, 
conasptSj  proaeduresj  rulesj  and  prinaiples)  for  REMEt&ERING, 
and  all  except  facts  for  USING.    For  example^  if  the  objective  and 
teat  item  called  for  the  student  to  remember  a  fact,  then  the  in- 
struction must  contain  a  statement  of  the  fact,  to  be  rementeredj 
and  at  least  one  practice-remembering  item  iHth  feedback.    No  ex- 
ample is  required^  because  it  would  be  redundant  with  the  statement. 


i 


i 
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CONSISTENCY  also  requires  that  each  required  component  meet  the 
following  criteria: 

1.  STATEMENTS  must  be  COMPLETE > 

2.  EXAMPLES  must  show  application  of  the  complete  content. 

3.  EXAMPLES  must  match  or  reflect  the  conditions  and  standards  required 
of  the  objective  and  the  test  as  closely  as  possible. 

4.  PRACTICE  must  include  FEEDBACK. 

5.  PRACTICE  must  be  of  the  same  task/content  level  as  the  test  item 
and  objective. 

6.  PRACTICE  must  match  or  reflect  the  conditions  and  standards  required 
of  the  objective  and  the  test  as  closely  as  possible^  or  be  designed 
to  help  the  student  gradually  learn  the  final  task. 

Most  of  the  requirementa  above  are  probably  obvious^  but  some  are 
aompliaated.    COMPLETENESS^  for  example^  requires  different  pre^ 
Bcriptiona  for  different  content  types: 

For  a  CONCEPT:    "complete"  means  that  all  the  critical  charac- 
teristics of  the  concept^  and  their  combination^  are  given. 

For  a  PROCEDURE:    "complete"  means  that  all  the  steps  of  the 
procedure  are  given  in  the  proper  order. 

For  a  RULE:    "complete"  means  that  all  the  steps  of  the  rule 
are  given  in  the  proper  order. 

For  a  PRINCIPLE:    "complete"  means  that  all  the  pre-  and  post- 
conditions ^  actions  J  processes^  causes^  effects ^  and  results  are 
stated^  and  the  relationship  between  them  is  clearly  stated. 
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PRESENTATION  ADEQUACY 


Onae^all  the  required  inatruotional  aomponents  are  presentj  and  each 
of  these  aomponents  meets  all  of  the  oonsistenay  criteria^  the 
ADEQUACY  of  the  presentation  can  be  assessed.    This  is  done  by  aheaking 
each  inatruotional  component  ( statement ^  examples j  practice)  for 
certain  oharaateristics. 


A  STATEMENT  is  ADEQUATE  if  it  meets  the  following  criteria: 

K    The  statement  must  be  SEPARATED  from  the  rest  of  the  instruction. 
This  helps  the  student  find  the  main  idea.    When  the  statement  is 
separated^  the  key  points  stand  outy  and  are  not  buried  in  the 
presentation.      There  are  several  ways  to  accomplish  this  goal: 

a.  Set  off  the  statement  with  boxes. 

b.  Use  a  different  color. 

o.    Use  a  different  type^  or  underline. 

d.  Place  on  a  separate  page^  or  in  a  special  place  on  the  page. 

e.  For  audio  or  movieSy  pause  before  giving  the  statement. 

2.    The  statement  must  be  IDENTIFIED.    After  the  statement  is  separated^ 
the  student  should  be  told  what  it  is.    This  permits  the  student's 
attention  to  be  focused  on  the  key  points  and  their  application^  rather 
than  the  student  trying  to  become  generally  familiar,  with  everything  in 
the  instruction.    One  way  to  identify  a  statement  is  to  use  the  word 
^'statement.  "   Other  more  content-oriented  words  are  even  more  helpful: 

definition        procedure  for   the  principle  of  

Main  Idea:         Key  Point:  General  rule: 


EXAMPLE: 


DEFINITION  OF  OHM'S  LAW: 


(Here J  the  statement 
is  separated  by  the 
boXj  and  identified.) 


In  addition  to  the  statement,  the  presentation  should  include  something 
to  help  the  student  better  understand  and  remember  the  statement. 
Methods  of  providing  this  help  include: 

a.  Giving  a  MNEMONIC  (memory  trick). 

b*  (Jiving  a  general  example  of  how  the  statement  can  he  used. 

c.  Explaining  why  the  statement  is  important. 

d.  Explaining  how  it  cam  abouty  how  it  fits  in  the  course^  or 
how  it  relates  to  something  the  student  already  knows, 

e.  Explaining  some  of  the  terms  in  the  statement. 

f.  Representing  the  statement  with  pictures^  symbols,  flcwoharts,^ 
tables  J  etc. 


EXAMPLE:  The  following  figure  can  be  a  helpful  memory^ 
devi'ce  for  Ohm^s.  It  will  help  you  remember 
it  so  you  can  use  it  later  on. 
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EXAMPLES  are  ADEQUATE  if  they  meet  the  following  criteria: 


1.  EXAMPLES  must  be  SEPARATED  and  IDENTIFIED. 

2.  EXAMPLES  must  include  some  type  of  help. 

3.  EXAMPLES  should  range  from  "easy"  to  "hard." 

4.  EXAMPLES  should  be  representative  of  the  job  the  student  will  do 
after  training. 

5.  There  should  be  enough  examples  to  cover  the  content  area  adequately, 

6.  EXAMPLES  should  clearly  show  why  common  errors  are  wrong. 

The  criteria  are  generally  aelp-explanatory.    SEPARATED  and  IDENTIFIED 
are  the  aame  aa  for  atabementa,  and  pointa  3  to  6  need  no  further 
explanation.    The  aecond  criterion^  HELP,  ia  applied  in  different 
waya  for  different  content  tyyea.    Some  typea  of  HELP  for  each  content 
type  are  given  belou): 

HELPS  for  CONCEPTS:    Highlight  the  critical  characteriatice  of  an 

eooantple* 

Explain  why  or  why  not  aome thing  ia  olaaaified 
as  a  member  of  a  concept. 

Show  the  use  of  a  checkliat  or  heuriatic  to 
help  claaaify. 

Simplify  early  examplea,  e*g.  uae  line^drawinga 
inatead  of  complicated  photographs* 

HELPS  for  P'^OCEDURES  or  RULES:    Explain  why  each  atep  ia  done. 

Explain  why  each  atep  ia  important* 
Give  additional  information  about 

how  to  perform  the  taak. 
Give  additional  information  about  how 

to  know  if  you've  done  it  wrong. 
Give  flowcharter  tablea,  etc. 

HELPS  for  PRINCIPLES:    Highlight  important  featurea. 

Simplify  the  relevant  information  from  the 

caae  atudy  in  which  it  ia  embedded. 
Use  logical  repreaentationa  of  the  IF-THEN 

relationshipa. 
Give  additional  information  about  how  the 

principle  appliea^  or  why  it  doesn't. 
Give  hinta  as  to  haw  to  analyze  probleme. 
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PRACTICE  items  are  ADEQUATE  if  they  meet  the  following  criteria: 

1.  The  PRACTICE  section  must  be  SEPARATE  and  IDENTIFIED. 

2.  The  PRACTICE  items  must  be  free  of  hints  that  wouldn't  be  present 
in  the  test  or  on  the  job. 

3.  The  PRACTICE  items  should  have  the  same  format  as  the  format  of 
the  test  items. 

4.  The  PRACTICE  items  should  range  from  easy  to  hard. 

5.  The  PRACTICE  items  should  be  typical  of  the  job  to  be  performed 
after  training. 

6.  The  PRACTICE  items  should  include  the  opportunity  for  cornnon  errors. 

7.  The  FEEDBACK  must  also  be  SEPARATED  and  IDENTIFIED  for  each  practice 
i  tern. 

8.  The  FEEDBACK  should  include  help  (similar  to  that  for  examples). 

(As  a  bare  minimum^  the  FEEDBACK  should  direct  the  student  back  to 
where  the  instruction  Das  originally  presented.    However^  it  is 
better  to  have  a  new  brief  presentation,  because  if  the  student 
got  the  practice  wrong,  the  original  presentation  didn't  help  enough.) 


The  criteria  are  also  self-explanatory. 


1  ' »  - 
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EXAMPLE:     The  next  example  ahowe  an  instructional  presentation  which 
folates  many  of  the  adequacy  criteria  described  above. 
This  example  is  followed  by  a  more  adequate  presentation  of 
the  same  sidbject  matter. 


INADEQUATE  PRESENTATION  on  the  principles  of  operation  of  an  alarm  circuit: 


Trie  alarm  circuit  senses  extremely  high  temperatures.    When  an 
extreme  steam  temperature  occurs  (which  is  a  very  dangerous  condition 
that  may  have  adverse  consequences  for  a  ship  and  her  crew),  the  sensing 
switch  contacts  close  thus  shunting  the  resistor.    The  decreased  resis- 
tance in  the  circuit,  according  to  OHM'S  LAW  (E=IR),  causes  an  increase 
in  current  flow  in  the  circuit,  which  is  enough  to  operate  the  alarm 
relay.    The  relay  is  designed  to  operate  at  a  current  flow  above  that 
normally  found  in  the  circuit.    OHM'S  LAW  states  that  with  voltage 
constant,  a  decrease  in  resistance  in  the  circuit  must  be  accompanied 
by  an  increase  in  current  flow.    The  contacts  of  the  alarm  relay  then 
close  to  actuate  the  audible  alarm  device,  which  may  consist  of  a  warning 
bell  with  an  electrically  operated  clapper,  or  an  H254  resonated  horn 
assembly.    Both  'of  these  produce  extremely  loud  signals  so  they  can 
overcome  normal  ambient  noise  levels. 


i  to  external  audible  alarm  device. 


Why  is  it  important  that  the  alarm  circuit  be  operational  at  all  times? 
Remember  what  hot  steam  can  do  to  ships  and  sailors. 


The  example  above  is  inadequate  in  several  ways.    First,  the  principle 
of  operation  of  the  circuit  is  not  separated  or  identified.    How  is  the 
student  to  know  what  to  learn  from  this  presentation?   Second,  the 
presentation  is  cluttered  with  a  lot  of  other  "nice  to  know"  information 
that  really  doesn't  help.    If  helps  were  included,  they  should  aid 
remembering  or  understanding  the  principles  of  operation  of  the  circuit. 
Also,  the  practice  is  not  separate  or  identified,  there  is  no  feedback, 
and  the  practice  really  has  nothing  to  do  with  remembering  the  principle. 

The  next  page  shows  a  more  adequate  presentation. 
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MORE  ADEQUATE  PRESENTATION  on  the  priciples  of  operation  of  an  alarm  circuit: 


OPERATION  OF  THE  ALARM  CIRCUIT: 


(This  section  describes  how 
the  alarm  circuit  operates) 


BASIC  SCHEMATIC 


Extremely  high  steam  temperatures  cause  the 
switch  to  close.    This  shunts  the  resistor, 
because  the  switch  and  the  resistor  are 
connected  in  parallel.    Circuit  resistance 
is  decreased,  and  therefore ^  current  flow 
is  increased.    The  increased  current  flow 
operates  the  relay,  closes  its  contacts, 
and  energizes  the  bell  or  horn. 

EXPLANATION 


1.  High  temperature  closes  switch. 

2.  Switch  shunts  resistor. 

Decreased  resistance  =*  increased 
current  flow.  (OHM^S  LAW) 

Increased  current  operates  relay. 

Relay  contacts  close. 

Relay  contacts  energize  bell  or  horn. 

PRACTICE:  Without  using  references  or  notes j  explain  how  an  alarm  circuit 
operates.  Be  sure  to  include  in  your  explanation  the  important 
actions  that  take  place  in  the  circuit.  (Answer  on  pg.  256.) 


(I 


256 

ANSWER  TO  PRACTICE  QUESTION:  There  are  several  ways  you  could  have  eosplained 
the  operation  of  the  alarm  circuity  but  your  answer  should  have  included 
the  following  ideas: 

2.    High  temperature  causes  the  switch  to  close. 

2.    When  the  switch  closes  it  reduces  total  resistance 

in  the  circuit. 
S.    Decreased  resistance  means  increased  current  flow. 

4.  The  increased  current  flow  operates  the  relay. 

5.  The  relay  contacts  close  and  operate  the  hell  or 
horn. 
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USING  THE  IQI: 


The  IQI  is  designed  for  QUALITY  CONTROL  during  any  objectives-based 
instructional  development  process. 

There  are  four  documents  that  comprise  the  IQI: 

1.  Introduction  and  Overview   (This  document) 

2.  user's  Manual    (oontaina  all  IQI  pvoaedxirea,  and  examplea 

of  their  use) 

3.  Workbook    (oontaina  praotiae  on  all  IQI  procedurea,  with 

feedbaok) 

4.  Job  Performance  Aid    (short  version  of  all  prooeduree) 

To  facilitate  use  of  the  IQI  procedures,  the  User's  Manual,  Workbook, 
and  JPA  were  designed  to  include  two  quality  control  forms:   The  first 
form  assesses  objective  adequacy,  and  the  second  is  used  to  determine 
test  consistency,  test  adequacy,  objective-presentation  consistency, 
and  presentation  adequacy.    The  suggested  use  of  these  forms  Is  as 
follows: 

1.  Either  during,  or  iitmediately  after,  the  development  of 
objectives  In  instructional  developi«rit,  use  the  objective 
adequacy  form  to  assess  the  adequacy  of  each  objective. 
Any  required  revisions  should  be  made  before  Instructional 
development  proceeds. 

2.  As  test  items  are  developed  for  each  objective,  they  should  be 
checked   for  consistency  with  objectives,  and  adequacy,  using 
the  second  form. 

3.  As  new  instructional  materials  are  developed,  or  as  existing 
materials  are  adopted,  they  are  checked  for  consistency  with 
objectives,  and  adequacy,  using  the  second  form.  Required 
revisions  to  materials  and  tests  are  made  before  they  are 
subjected  to  individual  or  small -group  try-outs. 

FOR  MORE  INFORMATION: 

For  more  Information  on  the  IQI,  contact:  phone: 

Navy  Personnel  Research  and  Development  Center  (714)  225-7121 

Code  P304 

San  Diego,  CA  92152  AUTOVON  933-7121 

7140 
7194 
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DESIGN  OF  MACHINE  SCORABLE  "HANDS-ON"  PERFORMANCE 
TESTS  IN  A  PAPER  AND  PENCIL  MODE 


by 

ROBERT  N.  JOHNSON 
US  Amy  Administration  Center 
Fort  Benjamin  Harrison,  IN  46216 


With  the  advent  of  job  and  task  analysis  in  occupational  training  programs, 
*Kg  esphssis  in  measurement  of  student  proficiency  has  shifted  from  the 
conventional  multiple  choice,  paper  and  pencil  test,  to  the  performance 
test.    Performance  tests  have  obvious  advantages.    They  require  students 
to  actually  demonstrate  their  ability  to  perform  job  tasks  to  specified 
standards  under  conditions  approximating  a  real  world  operational  situa- 
tion.   Unfortunately,  performance  tests  take  more  time  and  resources  to 
administer,  and  are  normally  subject  to  inadvertent  variation  in  the  way 
they  are  administered  and  scored.    These  cost  and  reliability  dlBadvantages 
are,  in  fact,  the  major  advantages  of  the  conventional  multiple  choice 
paper  and  pencil  test.    If  the  two  design  approaches  could  be  combined, 
the  resultant  test  might  be  called  a  machine  scorable  "Hands-On"  Performance 
Test  in  a  paper  and  pencil  mode. 

At  the  US  Army  Administration  Center  ve  have  been  experimenting  with  this 
type  test  for  several  years.    To  date  results  indicate  that  this  type  test 
is  best  used  when  three  conditions  are  met. 

1.  The  essential  behaviors  involved  in  the  task  to  be  tested  are 
mental  (or  cognitive). 

2.  Task  performance  results  in  a  tangible  product  with  measurable 
characteristics . 

3.  The  procedures  or  sequence  used  during  task  performance  need  not  be 
measured  during  the  test  as  long  as  the  finished  product  meets  specifica- 
tion (product  measurement  only) . 

With  that  introduction,  let  us  move  on  to  the  test  design  rationale. 

If  we  are  to  develop  a  machine  scorable  performance  test,  the  designer  must 
address  five  major  considerations  as  shown  in  Figure  1. 
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CONSIDERATIONS 

1.  Conditions  existing  prior  to  task  performance. 

2.  Initiating  cues. 

3.  Actual  task  performance. 

4.  Results  of  task  performance. 

5.  Cost  effectiveness. 

Figure  1. 

 — 
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The  conditions  existing  aspect  refers  to  the  establishment  and  provision 
of  an  environment  which  will  realistically  simulate  conditions  which  are 
similar  to  those  under  which  the  task  Is  actually  performed  In  the  real 
world. 

The  second  consideration  refers  to  the  necessity  for  considering  the 
initiating  cues  that  require  the  student  to  recognize  the  need  to  per- 
form the  task. 

The  third  consideration  Includes  the  need  to  Insure  that  each  student 
will  be  required  to  actually  perform  all  or  most  of  the  key  elements  of 
actual  task  performance  during  the  test. 

The  results  aspect  refers  to  the  need  to  insure  that  the  test  will  serve 
two  primary  purposes.    First,  we  must  insure  that  the  test  results  will 
separate  those  who  can,  from  those  who  can* t,  adequately  perform  the  task. 
Secondly,  we  must  assure  that  test  results  will  generate  diagnostic 
inform&'iiion  which  can  be  used  as  basis  for  Improvement  of  our  training 
program. 

And  lastly  we  must  provide  an  acceptable  tradeoff  between  the  efficiency 
of  the  test  and  the  costs  of  test  development  and  administration. 

I  will  nov  address  each  of  these  considerations  in  terms  of  their  in- 
herent primary  testing  concerns  in  the  design  of  a  machine  scorable  per- 
formance test. 


CONSIDERATION 


PRIMARY  TESTING  CONCERN 


1.     Conditions  existing  prior 
to  task  performance 


2.     Initiating  cues 


-  Realism 
Tools/reference  availa- 
bility 

Actual 
Simulated 

-  High  fidelity 
Overcuelng 


Figure  2 


With  respect  to  conditions  existing  prior  to  task  performance  we  are  con 
cerned  with  what  we  will  provide  the  student  to  facilitate  his  perform- 
ance during  the  test.    The  test  situation  should  portray  a  realistic 
setting  recognizable  to  the  student  as  a  feasible  real  world  situation. 
It  must  also  provide  necessary  background  or  peripheral  information 
necessary  to  task  performance  but  not  included  in  the  test  requirement. 


The  test  requirement  should  clearly  state  what  the  student  is  to  do  and 
what  tools  or  references  are  available  for  use  during  the  test.  While 
realism  is  the  key  to  the  test  situation,  the  test  requirement  may  entail 
limitations  to  full  fidelity  performance.    Tools  and  references  may  have 
to  be  simulated  to  facilitate  a  paper  and  pencil  test. 

Initiating  cues  must  be  examined  in  detail.    If  the  task  has  straight 
forward  initiating  cues  which  are  easily  recognizable,  the  test  require- 
ment should  do  little  more  than  state  the  performance  requirement.  In 
this  case  we  would  not  be  testing  cue  recognition.    Many  tasks,  however, 
have  single  or  multiple  initiating  cues  which  are  difficult  to  isolate 
from  other  competency  or  extraneous  cues  which  exist  in  an  operational 
environment.    When  cue  recognition  is  a  major  factor  in  failure  to  per- 
form a  task,  care  must  be  taken  to  insure  that  initiating  cues  are 
introduced  in  a  high  fidelity  manner  and  that  neither  the  test  require- 
ments nor  the  answer  sheet  overcue  the  student.     In  these  cases  the 
answer  sheet  must  provide  alternative  responses  based  upon  the  real  world 
behavior  of  failing  to  recognize  the  cue. 


3.    Actual  Task  Performance  -  Key  Behaviors 

-  Task  Domain 

-  Sampling  From  Domain 

-  Task  Integrity 

-  Test  Fidelity 


Figure  3 


Since  both  time  and  cost  are  major  test  design  considerations,  we  cannot 
afford  to  test  all  behaviors  inherent  in  an  operational  task.  According- 
ly, it  is  essential  that  the  detailed  task  analysis  be  examined  to  identi- 
fy the  key  behaviors  involved  in  task  performance.    For  linear  tasks 
consisting  of  step  by  step  procedures  which  are  always  perfcrzued  in  the 
prescribed  sequence,  the  identification  of  key  behaviors  is  no  problem. 
(Example  -  Disassemble  an  M-16  Rifle).    Most  tasks,  however,  have  a 
variable  procedure  during  which  steps  in  the  procedure  are  not  necessari- 
ly performed  in  sequence.    The  initiating  cue  or  other  cues  generated 
during  task  performance  dictate  what  is  to  be  done  next.     In  some  itera- 
tions of  these  tasks  certain  steps  are  by-passed,  in  others  the  same 
steps  may  be  repeated  several  times.    An  excellent  example  is  the  task  of 
"computing  a  travel  voucher."    Most  of  us  are  here  on  government  travel 
orders  ::nd  will  submit  a  travel  voucher  for  payment  upon  return  to  our 
home  stations.     Some  travel  clerk  will  have  to  compute  our  vouchers.  The 
initiating  cue  is  generally  the  same;  receipt  of  the  voucher  itself  with 
supporting  orders,  receipts,  itinerary,  etc.    How  the  task  is  performed, 
however,  will  vary  as  a  result  of  the  number  and  types  of  transportation 
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used,  types  of  expenses  Incurred,  times  of  arrival  and  departure,  etc. 
Each  of  these  variables  are,  in  fact,  .internal  cues  generated  during  any 
single  repetition  of  the  task  which  alters  what  must  be  done  and  how, 
for  that  specific  case.     These  possible  variations  are  dictated  by  the 
task  domain.    By  task  domain  I  am  referring  to  the  limits  or  scope  of  the 
task*    We  must  analyze  the  task  domain  in  terms  of  probable  and  possible 
variations  of  the  task  faced  by  the  job  incumbent  in  the  field  and  then 
develop  a  rationale  for  sampling  from  that  domain  for  testing  purposes. 
Then,  and  only  then,  can  we  determine  which  of  the  key  behaviors  should 
and  can  be  tested. 

Once  key  behaviors  are  identified,  the  usual  approach  to  paper  and  pencil 
machine  scorable  test  design  is  to  develop  one  or  more  items  to  address 
each  separate  behavior.    We  end  up,  therefore,  with  a  test  which  measures 
component  behaviors  independently,  without  any  assurance  that  the  student 
can  put  them  all  together  at  the  right  time  and  place,  and  in  the  right 
sequence,  to  accomplish  the  task  as  a  whole.    This  approach  destroys  test 
fidelity  and  Integrity  of  the  task,  and  results  in  a  test  of  questionable 
face,  content,  or  discriminate  validity.     If  we  are  to  assert  that  we 
have  developed  a  true  performance  test  we  must  assure  that  component 
behaviors  are  exercised  by  the  student  in  the  context  of  the  total  task 
much  as  he/she  would  in  a  real  world  environment. 


4.    Result  of  Task  Performance  -  Mastery  Standards 

-  Training  Feedback 

-  Answer  Sheet  Design 

Realistic  Alternatives 
Behavioral  Alternatives 
Facilitates  Error 

Figure  4 


With  respect  to  results  of  task  performance,  we  have  several  problems. 
Since  we  have  limited  the  domain  of  required  task  performance,  real  world 
mastery  standards  may  have  to  be  adjusted.    The  trick  is  to  establish 
test  standards  which  separate    performers  from  non-performers  as  defined 
by  the  student's    ability  to  meet  actual  task  standards  on  the  job.  Test 
validation  procedures  must  address  this  primary  concern.     Since  we  are 
also  concerned  with  training  feedback,  test  designers  must  insure  that 
each  test  not  only  properly  identifies  the  non-performers  but  also  facili- 
tates identification  of  the  cause  of  failure.    When  substantial  percentages 
of  students  fail  a  test  for  identical  reasons  we  have  identified  possible 
weaknesses  or  omissions  in  our  training  materials. 

Answer  sheet  design  is  the  key  to  training  feedback.    Over  the  years  we 
have  found  that  the  use  of  real  world  alternatives  facilitate  training 
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feedback  analysis.    In  most  cases  the  range  of  alternatives  provided 
should  encor^pa^s  all  real  world  behaviors  which  would  be  appropriate  to 
any  variation  of  the  t:otal  task.    The  real  world  option  of  doing  nothing 
must  be  included.    By  providing  a  complete  behavior  set  in  our  alterna- 
tives, we  allow  the  student  to  make  errors  of  omission  or  commission. 
Any  reasonable  error  will  then  be  accoinmodated  by  an  appropriate  alterna- 
tive.   The  alternatives  available  should  never  cue  students  to  the  fact 
that  they  have  made  an  error. 


5.    Cost  Effectiveness  -  Design  Costs 

-  Ease  of  Administration 

-  Time  to  Administer 

-  Validity 

Content 
Face 

Discriminate 

Figure  5 


As  with  any  test  we  must  develop  a  balance  between  cost  and  effectiveness. 
Time  or  cost  considerations  may  require  reduction  in  the  fidelity  of  the 
test,  thus  affecting  its  validity.     If  the  level  of  test  fidelity  is 
lowered  to  the  point  where  the  student  is  no  longer  performing^  key  be- 
haviors  in  the  context  of  the  total  task,  we  no  longer  have  a  "hands-on 
performance  test. 

These  then  are  the  considerations,  concerns,  and  principles  we  will  use  to 
develop  a  machine  scorable  "hands-on"  performance  test  in  a  paper  and 
pencil  mode* 

In  order  to  demonstrate  the  application  of  these  principles,  I  will  use 
as  an  example  the  task  "Select  a  Detail  Using  a  Duty  Roster."    Since  the 
full  task  analysis  will  be  published  in  these  conference  proceedings 
(Figure  14)  I  will  merely  provide  a  task  overview  to  facilitate  under- 
standing of  the  development  process. 

Every  unit  and  most  offices  in  the  Army  are  tasked  to  provide  a  detail  of 
one  or  more  personnel  to  perform  duties  which  are  incidental  to  mission 
accomplishment.    The  duty  roster  provides  a  mechanism  for  spreading  the 
burden  equitably  among  eligible  members  of  a  unit.    Normally,  the  senior 
NCO  of  each  unit,  or  his  clerk,  maintains  a  duty  roster  for  specific  or 
general  details.    In  the  real  world  the  unit  is  notified  orally,  in  writing, 
or  by  Standing  Operating  Procedure  to  provide  a  specific  number  of  person- 
nel on  a  specific  date  for  the  detail.    The  task  involves  posting  the 
current  status  of  each  eligible  member  of  the  unit  on  the  duty  roster  in 
terms  of  availability  or  non-availability  for  the  detail  and  determining 
who  should  be  detailed  based  upon  longest  time  since  last  selected.  A 
short  Army  regulation  dictates  how  availability  is  determined  and  selection 


Is  to  be  made  but  is  not  normally  used  during  performance.    The  task  re- 
quires personal  knowledge  or  input  as  to  the  current  and  projected  status 
of  each  member  who  may  be  suject  to  the  detail.    After  posting  the  duty 
roster,  the  NCO  selects  and  announces  who  will  "pull  the  detail."  The 
posted  roster  serves  as  the  basis  for  the  next  iteration  of  the  task. 
Improper  selection  or  posting  of  the  roster  results  in  complaints  from 
the  troops  and  impacts  on  morale. 

Just  to  keep  you  with  me,'  I  will  now  show  you  what  a  real  duty  roster 
looks  like  (Figure  6). 
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Figure  6 


Note  that  it  contains  nothing  but  personnel  Identification,  numbers  and 
letters  which  represent  availability  and  selection  priority  and  hash 
marks  to  Indicate  selection. 

Let  us  now  apply  the  principles  outlined  above  to  development  of  a  machine 
scorable  paper  and  pencil  performance  test. 

With  respect  to  providing  realistic  conditions  we  will  provide  a  simu- 
lated duty  roster  correctly  posted  to  Include  the  last  previous  detail. 
Since  current  and  projected  status  of  each  member  Is  available  on  the 
job,  we  also  provide  this  Information.    The  reference  regulation  will  not 
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be  provided.    Policy  on  who  is  eligible  for  the  detail,  the  date  of  the 

detail,  the  ntnnber  of  personnel  required,  and  the  current  date  will  also 

be  provided •    Since  the  initiating  cue  is  clear-cut,  we  will  simply  tell 
the  student  to  post  the  roster  and  select  the  detail. 

Accordingly,  the  instructions  to  the  student  would  read  something  like 
this: 

Test  Situation:    Your  unit  maintains  a  duty  roster  for  police  call. 
This  is  a  weekly  detail  for  which  you  provide  one  soldier  for  one  day 
each  yeek.    All  personnel  grade  E4  and  below  are  eligible  for  this  duty. 
Shown  at  Figure         is  the  current  status  of  the  Police  Call  Roster  show- 
ing the  correct  last  column  entry  for  the  last  detail  on  10  June  1981, 

Figure         also  shows  a  note  containing  known  personnel  status  changes  as 

of  today,  15  June  1981.    You  may  assume  that  the  status  of  each  soldier 
remains  the  same  as  on  the  current  roster  unless  the  status  notes  indicate 
a  change. 

Test  Requirement:    Examine  the  Police  Call  Roster  and  read  the  notes 

contained  in  Figure  ,  then  actually  post  the  duty  roster  for  17  June 

1981. 

So  far  we  created  a  realistic  test  situation  and  a  "hands-on"  performance 
requirement.    Now  we  must  develop  the  simulated  duty  roster  and  the 
status  changes  with  which  the  student  will  work. 

Our  next  step  is  to  examine  the  task  analysis  and  identify  the  key  be- 
haviors involved  during  the  performance  of  the  task.    For  this  task  the 
key  behaviors  are  shown  at  Figure  7  (see  next  page). 
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KEY  BEHAVIORS  DURING  PERFORMANCE 
BASED  ON  CURRENT /PROJECTED  STATUS: 

A.  Identify  personnel  to  be  added  to  roster. 

B.  Identify  personnel  to  be  deleted  from  roster. 

C.  Classify  roster  personnel  as  available  or  not  available. 

D.  Classify  non-availables  into  three  categories. 

1.  Authorized  absence  -  (Code  A) 

2.  Other  duty  commitment  (Code  D) 

3.  Unauthorized  absence    (Code  U) 

E.  Advanced  eligibility  number  by  one  for: 

1.  Availables  (number  only) 

2.  Code  D  non-availables 

3.  Code  U  non-availables 

F.  Do  not  advance  eligibility  number  for  Code  A  non-availables. 

G.  Enter  appropriate  eligibility  number,  code,  and/or  name  on  duty 
roster. 

H.  Select  most  eligible  available  based  upon  highest  eligibility 
number. 

I.  Select  between  equals  by  highest  position  on  roster. 

J.    Erase  entries  for  final  selections  and  enter  hash  marks. 

Figure  7 
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This  Is  all  easy  enough.    Now  the  problem  is  to  identify  the  task  domain 
and  develop  a  reasonable  sample  thereof.    There  are  many  approaches  but 
for  this  case  we  used  a  Test  Content  Matrix  which  contrasts  what  is  to 
what  could  be  or  a  before  and  after  approach  as  shown  at  Figure  8. 
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SELECTED 
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Figure  8 


Down  the  left  axis  we  portray  the  situation  which  could  exist  on  the  cur- 
rent duty  roster.    Along  the  top  we  portray  the  situation  which  could 
exist  after  posting  of  the  roster.    The  intersect  boxes  represent  possible 
variatTon8~of  the  task  or  the  total  domain  of  task.    For  some  tasks  the 
elements  on  each  axis  may  be  different  but,  amazingly  enough,  they  are 
often  identical. 

We  now  examine  each  intersect  point  and  plot  the  key  behaviors  which  would 
have  to  be  applied  to  that  specific  combination.    Once  plotted,  we  rei^iew 
the  results  to  see  which  variations  require  identical  behaviors  and 
whether  all  behaviors  are  included.    Variations  with  identical  behaviors 
are  identified  by  a  number  representing  that  group  of  behaviors  as  shown 
in  Figure  9. 
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BEHAVIORAL  GROUPINGS 
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For  example,  each  block  or  grouping  labeled  as  number  1  requires  the  stu- 
dent to: 

-  Classify  personnel  as  available. 

-  Advance  eligibility  by  one 

-  Enter  appropriate  number. 

-  Select  the  individual  with  highest  eligibility  number. 

-  Erase  number  and  enter  hash  marks. 

To  fully  sample  the  complete  domain^  of  this  task  will  therefore  require  at 
least  one  case  for  each  of  the  behavioral  groupings  or  a  total  of  eight 
cases.    To  maintain  realism,  however,  we  need  at  least  four  cases  of  group 
5  (availables)  so  that  the  student  has  a  pool  to  select  from.  Accordingly, 
we  need  at  least  11  cases  to  cover  the  waterfront.     If  test  constraints 
preclude  that  number,  some  behavioral  groupings  can  be  dropped  based  upon 
importance.     For  example,  group  8  may  represent  a  rare  and  unusual  circum- 
stance which  presents  no  real  problem  in  the  field.     It  would  then  be 
dropped.     If  the  matrix  fails  to  identify  any  key  behavior,  care  must  be 
taken  to  introduce  it  in  its  proper  context.     By  this  approach  we  develop 
a  reasonable  sample  from  the  total  domain  of  the  task.    What  the  student 
will  see  looks  like  Figure  10  (see  next  page). 
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Figure  10 


When  he  is  finished  posting  the  roster  it  should  look  like  Figure  11  (see 
next  page). 
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Figure  11 


So  far  we  have  created  a  high  fidelity  performance  test  In  a  paper  and 
pencil  mode  but  the  result  of  performance  must  be  hand  scored.     The  prob 
lem  now  Is  to  determine  whether  we  can  capture  the  essence  of  that  per- 
formance In  a  student  produced  machine  scorable  format.    We  approach 
this  problem  by  examining  the  final  product  itself  and  the  task  analysis 
to  identify  the  characteristics  of.  >u  acceptable  product.     For  this 
task  the  key  characteristics  are  shox^  at  Figure  12. 
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PRODUCT  CHARACTERISTICS 
When  appropriate  displays  a  (an) 

'  "A"     (Authorized  Absence) 
"D"     (Other  Duties) 
"U"     (Unauthorized  Absences) 
Advanced  Number 
Unadvanced  Number 
Additional  Names 
Deletions  from  the  Roster 
Absence  of  an  Entry 
Hash  Marks  (Selections) 

Figure  12 


Regardless  of  the  intermediate  processes  involved,  the  terminal  behavior 
of  the  student  is  represented  by  these  product  characteristics-  By 
restating  these  characteristics  in  terms  of  student  behaviors  we  develop 
realistic  and  behavioral  alternatives  which  can  serve  as  the  basis  for 
our  answer  sheet  as  shown  in  Figure  13.     Correct  answers  to  the  sample 
test  are  indicated  on  the  answer  sheet. 
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In  effect  we  now  have  the  student  actually  posting  the  duty  roster  and 
then  telling  us  what  he  has  done  in  a  machine  scorable  format.    Note  that 
the  alternatives  are  a  complete  behavioral  set.    There  is  nothing  else 
reasonable  that  the  job  incumbent  can  do. 

The  fact  that  two  answer  categories  are  not  used  as  correct  responses  in 
this  version  of  the  test  is  irrelevant  because  those  behaviors  are 
feasible  and  will  be  used  by  non-performers  who  incorrectly  perform  the 
task.    We  are  therefore  facilitating  error  and  avoiding  overcueing  by 
maintaining  the  entire  behavioral  set  of  alternatives. 


A  major  advantage  of  an  answer  sheet  with  behavioral  alternatives  Is  that 
the  test  situation,  test  requirement,  and  answer  sheet  need  never  be 
changed.    An  Infinite  number  of  different  tests  can  now  be  developed  by 
merely  changing  the  current  and  projected  status  portrayed. 

We  now  must  validate  the  teSL  by  administering  It  to  a  group  of  masters 
and  non-masters  to  Insure  that  It  actually  discriminates  between  the 
two.    The  validation  will  also  help  us  to  Identify  the  cut  score  for 
this  test  which  equates  to  full    task  mastery. 

Note  that  the  only  unrealistic  behavior  required  by  this  test  Is  to  trans- 
fer the  actual  coding  to  the  answer  sheet.    This  Is  considered  worthwhile 
In  terms  of  reduced  costs  of  hand  scoring  and  the  generation  of  diagnostic 
training  feedback.     It  does,  however,  produce  an  additions!  dimension  to 
the  validation  procedure.    The  actual  posting  of  the  duty  roster  must  be 
compared  with  the  answer  sheet  during  validation  to  identify  the  propens- 
ity for  trangcrlptlon  error.     If  the  training  materials  are  designed  with 
the  sauie  answer  sheet,  this  problem  normally  disappears. 

After  administration,  summarized  test  results  in  terms  of  item  analysis 
will  identify  behavioral  errors  made  by  individuals  or  groups,  thus 
facilitating  the  identification  of  weaknesses  or  omissions  in  our  train- 
ing materials. 

By  applying  the  principles  and  procedures  outlined  in  this  paper  we 
can  create  "Hands-On"  Performance  Tests  in  a  paper  and  pencil  mode. 
Task  integrity  and  test  fidelity  are  maintained.     Content,  face  and  dis- 
criminate validity  are  inherent.    Most  Importantly,  the  test  does  separ- 
ate the  men  from  the  boys  and  provides  detailed  training  feedback. 

NOTE:    The  sample  test  displayed  in  this  paper  does  not  exactly  match 
the  rationale  for  selection  from  the  task  domain  discussed  on 
page  10.     That  test  will  be  used  in  the  near  future  and  could 
not  be  compromised.     The  sample  test  displayed  merely  illustrates 
the  approach. 
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A  LEARNING  -  RECEPTIVE  STATE  AS  INDUCED 
BY  AN  AUDITORY  SIGNAL  OR  FREQUENCY  BULSE 


Raymond  0.  Waldkoetter,  Ed^D.  and  John  R.  Milllganj,  Ph.D. 


US  ibuiy  Research  Institute  for  the  Behavioral  and  Social  Sciences 
Fort  Sill  Field  Unit,  P.O.  Box  3066,  Fort  Sill,  Oklahoma  73503 


INTRODUCTION 

Many  instructional  procedures  and  techniques  are  and  have  been 
developed  to  make  learning  more  effective.    From  the  introduction  of  the 
printed  text  teachers  have  expounded  on  techniques  for  getting  the  student 
or  subject  to  more  readily  learn  and  recall  the  procedural  transmission 
of  text  content  and  material.    Holding  the  student's  attention  and  perhaps 
arousing  a  little  motivational  commitment  seam  to  still  have  a  high  degree 
of  relevance  and  educational  concern.    After  the  printed  text  came  the 
development  of  audio-visual  techniques  and  programmed  text  content.  Yet 
relatively  few  students  appear  to  become  so  entranced  with  cognitive  or  non- 
cognitive  skill  learning  that  they  will  persist  in  spite  of  the  lure  of 
television  and  other  recreational  distractions. 

It  would  seem  that  added  emphasis  on  the  intrinsic,  self  direction  of 
students  to  find  a  learning  state  that  is  anticipating  and  basically 
stress  free  should  succeed  where  the  extrinsic,  apparatus  oriented  approach 
has  no  .    This  is  not  to  advocate  that  the  many  advantages  of  apparatus 
in  teaching  and  education  or  training  be  discarded  with  the  instructional 
materials  so  conscientiously  developed.    Rather  that  the  student's 
perceptual  awareness  and  dynamics  for  focusing  attention  be  re-examined  to 
deliberately  establish  what  sort  of  intrapersonal  responses  to  promising 
stimuli  indicates  a  more  persistent  receptivity  for  learning  and  success 
in  subsequent  evaluation. 


The  views  expressed  in  this  paper  are  those  of  the  authors  and  do  not 
necessarily  reflect  the  views  of  the  Army  Research  Institute  or  the 
Department  of  the  Army. 
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By  now  the  question  must  explicitly  occur  as  to  what  methods  and 
techniques  are  possible  to  bring  about  the  self  directed  and  intrinsic 
motivation  of  the  learner  as  evaluated  subject?    In  the  context  of  my  paper's 
direction,  there  is  the  accepted  condition  that  attention  is  given  to 
any  stimulus  that  will  achieve  an  independent  identity  whether  pressure, 
light,  sound  or  pain.     Such  a  stimulus  does  not  need  to  be  at  a  conscious 
level  of  awareness,  but  can  exist  merely  as  an  unheard  sound  or  even 
suggestion.     The  necessary  complement  to  the  stimulus  then  is  obtaining 
the  student's  or  subject's  awareness  that  the  stimulus  is  present  and  ran 
be  responded  to  along  with  other  intended  behavior  for  an  expected  result. 

Once  this  association  is  accepted,  then  a  highly  structured  mode  of 
instructional  communication  is  required  to  relate  to  the  stimulus  which  is 
unique  and  produces  a  facilitating  response.    Much  as  with  the  electrical 
pulses  going  through  the  telephone  lines  the  voice  message  is  accepted  in 
the  electrical  current  and  reproduced  for  the  listening  party  without  any 
attending  behavior  toward  the  actual  electrical  pulse  but  only  to  the 
voice  message.    Accordingly,  if  a  state  is  induced  by  an  auditory  signal 
using  a  pulsed  sound  frequency  to  ready  the  student  and  increase  relaxation, 
an  attentive  rhythm  will  possibly  occur  to  maintain  a  passively  focused 
awareness.     That  is,  once  a  characteristic  alpha  brain  wave  is  induced  by 
a  particular  set  of  auditory  stimuli  that  condition  should  continue  or  be 
reinforced  while  positively  suggested  (voice)  content  material  is  presented 
in  an  initial  or  retrieved  contc 

METHOD 

Now  should  the  hypothetical  state  come* about  where  a  learner  could 
respond  to  an  auditory  signal,  it  is  conceivable  that  a  more  receptive 
behavioral  mode  would  follow  with  less  anxiety  and  a  positive  expectation 
for  acquiring  new  information  or  recalling  that  already  stored.    There  are 
contradictory  experiences  in  the  use  of  auditory  signals  and  the  method 
for  inducing  alpha  brain  waves.    Research  has  shown  some  favorable 
experience  for  using  alpha  waves  with  a  positively  correlated  relationship 
between  percentage  of  alpha  and  memory  (Green,  1973).    This  obviously  sets 
the  stage  for  exploring  the  use  of  the  alpha  wave  state  to  figure  out  how 
learning  may  or  may  not  attain  specifically  designated  objectives  with 
complementing  positive  reinforcement  techniques  for  learning  and  retention. 

At  the  edge  of  conscious  attentiveness  the  alpha  and  theta  brain 
waves  may  occasionally  both  appear.     The  more  prevalent  alpha  frequency 
is  functionally  apparent  even  when  the  student's  eyes  are  open,  if  properly 
conditioned.    Usually  with  full  physical  reality  contact,  in  an  operational 
mode,  the  beta  frequency  is  dominant,    l^em  the  alpha  wave  is  maintained 
from  8  to  13  Hz  (cycles  per  second)  and  occasionally  dropping  below  the 
8  into  theta,  the  student  and  trainee  can  experience  a  more  complete  sense 
of  relaxation  with  attending  auditory  awareness. 
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In  this  state  the  Individual  is  usually  described  as  less  ego 
involved  and  less  inclined  to  have  the  usual  susceptibility  to  inhibitions 
or  so-called  learning  blocks.    Within  this  state  decision-making  effective- 
-less  will  follow  in  a  divergent  pattern  instead  of  converging  in  that  many 
:h:5Ughts  or  images  pass  through  the  level  of  awareness  with  no  conscious 
attempt  to  analyze  barriers  or  disciplines,  intuitive  impressions  are  given 
free  rein,  the  generation  of  many  ideas  is  encouraged,  and  evaluative 
criteria  are  used  primarily  to  creat;>  a  synthesis  of  material  for  new 
ideas  (Dirkes,  1978). 

Since  learning  and  information  acquisition  seem  to  require  more 
capabilities  than  strictly  programmed  instruction  permit,  the  learning 
and  decision-making  state  must  furnish  ample  opportunities  to  explore  and 
reinforce  those  ideas  or  relationships  that  lead  to  other  testable 
patterns  without  being  an  end  in  themselves.    A  frontier  is  recognized 
herein  as  the  imperative  need  to  take  fuller  advantage  of  mental  potential 
by  searching  out  how  the  learning-receptive  state  is  attained  efficiently 
and  what  is  the  most  productive  way  to  use  such  passively  focused  aware- 
ness.    It  is  granted  that  total  reliance  on  the  relaxed  condition  would 
invite  diminishing  returns  if  too  much  importance  is  accorded  the 
instilling  process  without  ever  getting  into  applied  execution  of  ideas 
or  decisions. 

Although  the  introduction  of  the  Losanov  (1975)  educational 
methodology  is  reported  to  have  successful  results  in  Bulgaria  and  now 
through  an  Iowa  State  University  adaptation  (Prichard  &  Schuster,  1978), 
the  attentiveness  of  the  student/subject  may  fluctuate  depending  on  the 
instructional  mode  and  environmental  controls.    This  evolving  Suggestive- 
Accelerated  Learning  and  Teaching  (SALT)  approach  consists  of  inducing  a 
relaxed  and  receptive  cognitive  state  in  the  student  by  conscious 
suggestions  and  then  presentation  of  the  learning  material  in  combination 
with  background  music  (sound)  frequencies.    The  pragmatic  results  of  the 
Losanov  method  and  the  Americanized  version  are  open  to  critical  challenge 
in  some  respects.    There  is  nevertheless  a  consistent  record  of  repeated 
uses  of  the  techniques  under  the  method  showing  both  a  more  attentive 
student  adjustment  and  increased  acquisition  and  retention  in  a  shorter 
span  of  instruction.    While  a  mix  of  audio-visual  and  even  tactile  stimuli 
are  employed,  the  fundamental  reference  point  is  instructor  voice  or  audio 
direction  and  evaluation. 

Research  in  this  area  generally  shows  a  deficiency  primarily  in  terms 
of  integration  of  component  learning  or  training  parts.  Methodology 
advocated  in  this  paper  is  to  bring  about  the  introduction  of  a  consistent 
auditory  signal  stimulus  with  combined  cognitive-emotional  suggestions 
carrying  tactical  information  and  the  use  of  perf ormance-orient-ed  bio- 
feedback.   Because  auditory  guidance  or  signal  frequencies  are  in  part 
established  as  a  known  stimulus,  it  is  further  postulated  that  learning- 
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receptive  states  of  consciousness  can  yield  positive  learning  effects  as 
pulsed  frequencies  are  experienced  and  instruction  Is  phased  into  the 
monaural  or  binaural  delivery  and  correlatea  biofeedback  assists  in 
clarifying  performance  objectives. 

As  exposure  to  a  recently  patented  auditory  guidance  s^ystem  (Monroe, 
1977)  has  shown  a  potentially  feasible  approach  in  sound  stimulus 
experimentation,  effort  has  been  invested  in  exploring  the  applied 
functions  of  such  a  system.    Analysis  of  this  Auditory  Guidance  System 
(AGS)  paradigm  should  attempt  to  cover  the  "unified  technology"  of  soui:d 
induction,  content  material  design,  and  measure  relationships  to  training 
effectiveness,  modes  of  learning  expression  and  perception,  and  states  of 
conscious  awareness.    The  major  objective,  then^,  is  to  try  to  investigate 
"what"  inqjroved  learning  and  operational  behav  -     could  demonstrate  more 
effective  individual  control  and  linkage  of  thought,  informational,  and 
memory  processes* 

RESULTS 

Previous  results  from  research  in  the  area  of  anxiety  and  learning 
have  consistently  shown  important  relationships  between  various  levels  of 
anxiety  and  effectiveness  of  training  (Isen,  Clark,  Shalker,  &  Karp,  1978). 
Most  instructional  technology  largely  ignores  this  set  of  relationships 
and  must  obtain  further  special  elaboration  to  devise  real  applications 
to  surmount  unidentified  frustration  obstacles  (UFOs)  in  trying  to 
increase  learning  rate  and  mastery  of  complex  behavior.    Remove  of 
cognitive-emotional  barriers  to  effective  learning  is  closely  related  to 
anxiety  levels  and  has  been  substantially  surveyed  to  identify  targets  for 
perceptive  changes  in  gaining  learning  efficiency  (McGrath  &  Cohen,  1978), 
Much  of  these  research  results  have  centered  around  building  a  learner's 
self  confidence  and  receptivity  by  use  of  conscious  and  unconscious 
suggestion  administered  under  specific  levels  of  learner  anxiety  levels. 
Also,  relatively  sophisticated  biofeedback  instrumentation  must  provide 
verified  relationships  reinforcing  the  learner's  capability  to  consciously 
control  certain  cognitive  and  emotional  states  favorable  to  learning 
receptiveness  (Barber,  1972). 

One  attempt  at  a  "unified  technology"  to  change  learning  perceptions 
and  responses,  as  illustrated  by  the  SALT  programs,  strives  to  adapt 
knowledge  from  any  pertinent  field  to  accelerate  the  learning  process  by 
integrating  cognitive-emotional  stimuli  into  instructional  programs. 
Conscious  suggestions  are  given  in  the  context  of  rhythmic  performance 
with  the  background  sound  and  altered  modes  of  auditory  expression  and 
directed  skill  participation,  reinforcing  continually  the  fe'. 'ings  and 
attitude  of  relaxation  and  full  satisfaction  in  performing  the  activity. 
Of  many  examples,  both  remedial  work  in  language  (Prichard  &  Taylor,  1976; 
Caskey,  1976)  and  teaching  a  junior  high  school  science  class  (Gritton  & 
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Benitez-Bordon,  1976)  have  led  to  significant  positive  results  in  sur- 
mounting past  barrieri^  and  acquiring  new  information. 

A  slightly  similar  development  which  had  its  origin  in  the  trans- 
cendental meditation  (TM)  movement  and  then  broke  away  is  that  known  under 
'•the  relaxation  response"  technique.    Peters  and  Benson  (1978)  have  reported 
highly  positive  results  taking  ''the  relaxation  response"  into  a  business 
setting  from  their  Harvard  research  development  site.    They  have  provided 
consultative  direction  for  voluntary  "relaxation  response  breaks"  resulting 
in  significantly  positive  employee  ratings  of  stress  symptom  reduction, 
improved  performance  and  sociability-satisfaction. 

Again,  there  is  the  recurring  trend  that  physiological  and  psycho- 
logical measures  are  strongly  related  and  subject  self  control  brings 
enhanced  behavior  and  performance.    Perhaps  the  remaining  challenge  is 
to  discover  how  to  precisely  integrate  the  sound  based  instructions  and 
rhythmic  pulsing  with  properly  reinforced  learning  modules  and  spaced 
training  phases  for  performance  skills. 

In  1960  (Berlyne)  a  report  of  a  Russian  investigation  described  how 
pairing  a  tone  signal  with  an  electrical  shock  brought  about  a  blood 
pressure  cr  stress  change.    Gradually,    though,  with  continuing  trials  of 
signal  and  shock  in  close  sequence  the  response  was  extinguished,  just  as 
one  usually  adapts  to  a  stimulus  causing  a  mild  irritant.    However,  with 
a  change  in  stimulus  pattern  the  tone  by  itself  again  evoked  the  stress 
arousal  much  as  though  one  might  respond  to  a  cry  in  the  night  but  only 
briefly  attend  when  performing  the  multitude  of  concurrent  day-time 
activities.    Certainly  learning  and  retention  are  effected  by  auditory 
stimuli  to  a  recognizable  degree.    So,  if  the  type  of  signal  is  available 
to  induce  and  sustain  a  steady  state  of  relaxed  awareness  with  even 
possible  peakec  levels,  and  incisive,  suggestively  adapted  course  material 
is  presented  in  well- focused,  varied  patterns,  there  should  be  a  reasonable 
probability  that  both  general  and  specific  performance  results  are  well 
within  the  scope  of  audio-guided  behavior 

The  AGS  research  being  described  by  Monroe  (1978)  and  demonstrated 
in  stress-reduction  workshops  has  identified  a  principal  component  in 
creating  a  newly  innovated  technique  referred  to  as  the  frequency  following 
response  (FFR) .    There  are  cumulative  experimental  data  showing  how 
subjects  respond  to  such  sound  frequencies  structured  to  enhance  the  alpha 
brain  waves  and  other  psychophysical  states.     Such  sound  which  moves 
through  audible  ra.iges  also  has  masked  pulses  triggering  what  is  termed 
the  FFR.    That  is,  there  is  synchronization  of  the  signal  and  subject 
brain  waves  bringing  a  relaxed  state,  audible  sound  of  surf  and  wind  in 
the  background,  and  the  preparatory  stage  is  set  for  altering  alpha  with 
programmed  training  modules  and  biofeedback  monitoring.    Drawing  upon 
prior  audiogenic  discoveries  and  mnemonic  instructional  states,  attention 
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and  learning  dimensions  can  be  charted  based  on  the  audio  signals  and 
combined  voice  instructions  carried  by  mixed  rhythms  of  monaural  and 
binaural  stimuli. 

Following  this  line  of  exploratory  development  already  verified  in 
part  by  Monroe's  generic  patent  of  1975,  it  is  not  inconceivable  that 
research  will  quickly  extend  to  take  advantage  of  these  partially  confirmed 
audiogenic  and  adaptive  listening  pattern  correlates.    Adaptive  learning 
behavior  will  build  on  a  progressive  series  of  FFR  tape  recordings  letting 
the  student  experience  differing  information  acquisition  and  perceptual 
dimension  states.     Using  an  adaptive  mix  of  complex  audio  patterns, 
rather  than  static  audio  frequencies,  carefully  synchronized  verbal 
guidance  will  instruct  that  selective  listening  techniques  be  passively 
focused  on  critical  information  processing  requirements. 

This  approach  could  include  a  fully  "unified  training  technology*' 
of  complementary  suggestive  learning  and  teaching  precepts  adhering  to  an 
engineered  human  resource  model  of  training  with  sound,  tailored  course 
modules,  and  evaluative  procedures.    A  parallel  monitoring  of  electro- 
physiological activity  would  record  further  audiometric  responses  to 
indicate  learning  changes  in  attentiveness  and  perceptual  modes.  The 
extent  to  which  audio  stimulation  and  guided  instructional  content 
enhance  operator  capability  would  seem  to  deserve  intensive  research  for 
probable  high  risk  results  to  increase  human  potential  in  controlling 
complex  mental  activities. 

DISCUSSION 

Should  the  development  of  an  AGS  for  accelerated  learriig  technique's 
and  instructional  system  design  prevail  in  the  face  of  those  advocating 
only  extrinsic  motivation,  it  appears  possible  to  markedly  modify  training 
patterns,  perceptual  modes  and  temporal  states.     By  enhancing  thought 
and  information  processing,  memory  and  recall  of  data,  human  factor 
variables  should  function  more  reliably  for  intra-  and  inter-system 
operations.    Learner  and  operator  functions  can  have  defined  training 
requirements  with  selected  critical  tasks  identified  for  sequential 
stages  of  assessed  proficiency.     Concurrently,  experimental  steps  would 
analyze  the  patently  valid  baisis  of  the  AGS  to  evaluate  any  constraints 
in  terms  of  information  input  functions  and  human  storage  security.  By 
designing  given  training  objectives,  students  following  a  programmed  AGS 
sequence  would  furnish  fhose  data  indicating  the  extent  of  AGS  improved 
behavioral  dimensions  and  operational  performance. 

Again,  taking  advantage  of  the  proprietary  AGS  monaural  and  binaural 
stimuli,  work  should  explore  the  relative  scope  of  decision-making 
requirements  involving  novel  human  factor  responses  and  functions  of 
adaptive  conscious  states  and  associated  physiological  mechanisms. 
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ally,  there  are  mauy  questions  needing  answers  in  this  developmental 
rch  area,  substantiating  even  more  the  need  for  this  comprehensive 
rch  strategy  which  may  bear  some  similarity  to  the  initial  space 
rch  reqtiiring  interdisciplinary  coordination.    Now  a  realistic. 
Integrated  approach  toward  conquering  facets  of  human,  inner-space 
rs  can  produce  new  educational  and  behavioral  practices  for 
lent  learning  and  self  control. 

Rapid  development  of  interactive  computer  systems  and  biofeedback 
ejmentation  mark  another  convergence  of  scientific  advances  making 
tate  of  the  art  ready  for  audio  and  video  response  modes.  Students 

oper&t^T  interactively  in  the  future  so  that  computer  assisted 
uctlon  aad  self  control  of  physiological  parameters  are  synthesized, 
edly,  audio  conditioning  and  guidance  research  achievements  are 
g  into  applied  stages  on  a  series  of  fronts  running  from  sleep 
tion,  stress  and  pain  reduction,  through  suggestive-relaxed  training 

The  "unified  training  technology"  to  optimize  intrinsic  learning 
dures  and  extrinsic  motivational  packages  with  computer  assisted 
gues  must  not  look  that  far  away,  unless  one  insists  on  denying  the 
mation  and  technological  explosion.    Many  agencies,  individuals, 
ystems  are  confronted  with  the  challenge  to  deliver  the  intrinsic 
tional  technology  that  will  herald  optimal  student  responses,  while 
another  extrinsic  direction  we  are  exhorted  to  utilize  more  of  our 

capacity.    For  example,  in  the  wake  of  this  turmoil,  this  past 
a  policy  analyst  (Fletcher,  1978)  for  the  Deputy  Assistant  Secretary 
ducat lor  i  oted  that  education  would  be  completely  revolutionized 
a  methoJ  ::ould  evolve  to  enable  a  person  to  have  memory  recall  on 
d  or  at  least  the  processes  for  insuring  retrieval. 

What  does  this  all  have  to  do  with  personnel  system  testing  and 
iition?    V   '  may  rightly  wonder!    The  AGS  can  yield  in  this  "imagined 
rio  within  five  to  ten  years  that  instructional  technology  assuring 
nt  attentiveness  and  rapid  mastery  of  given  subject-matter  content, 
iable  responses  would  have  the  computer  video  support  of  adaptive, 
red  testing  breakthroughs  (Urry,  1977)  significantly  testing  with 

questions  and  for  greater  psychometric  efficiency.    The  highly 
tured  student  input  will  relate  to  the  tailored  testing  and  informa- 
theory  and  to  a  greater  extent  close  the  loop  on  diagnosing  and 
ribing  accelerated  or  remedial  learning  conditions.  Individuals 
d  have  more  personal  control  for  recall  of  their  self  contained 
-universe  of  test  responses  and  respond  more  appropriately  to  the 
-content  and  selection  search  for  precisely  tailored  test  questions. 

Over  the  1990  horizon  we  may  surely  find  an  audio-video  display 
nal  and  AGS  embedded  training  modules,  the  student  interactively 
d  with  the  computer,  and  a  vide  assortment  of  tailored  tests. 
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Certain  audio  and  video  stimuli  patterns  will  guide  relaxed  but  Intensive 
self-retrieval  searches.    Alternating  test  trials  should  surmount 
emotional  or  skill  barriers  with  precisely  designed  test  responses. 
Between  trial  interpretive  and  transitional  phases  will  suggest  further 
guided  instruction  to  store  responses  correlated  with  key  evaluation 
criteria  pinpointed  by  tailored  testing  dialogues. 

In  closing,  might  it  now  be  agreed  that  acquisition  and  retrieval 
of  information  is  aided  with  stress  reduction  as  indicated  by  numerous 
verified  measuring  procedures?    An  affirmative  answer  would  obviously 
suggest  that  instructional ,  information  processing ,  and  evaluative 
technology  should  now  have  the  necessary  design  to  include  those  auditory 
stimuli  which  induce  more  affectively  integrated  and  responsive  behaviors. 
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Complexity  of  Flight  Path  Data  as  an  Index  of  Skill  In  Piloting 
Performances  from  a  Flight  Simulator  Based  Job-Sample  Test 

Brian  D.  Shipley,  Jr» 
V.  S.  Army  Research  Institute  Field  Unit 
Fort  Rucker,  Alabama  36362 


To  apply  a  flight  simulator  based,  job-sample  testing  method  to 
the  problem  of  selecting  trainees  for  Array  helicopter  pilot  training, 
it  was  essential  to  develop  a  set  of  valid,  reliable,  and  informative 
indicators  of  performance  skill.     The  job-sample  test  provides  a  compre- 
hensive time  history  record  of  each  performance  on  twelve  simulator  control 
and  instrument  variables  and  two  measures  of  side  task  performance.  The 
problem  of  this  methodological  investigation  was  to  develop  a  procedure  for 
reducing  the  resulting  mass  of  time  history  data  to  a  few  TiEaninc,ful  Indices 
of  performance  skill. 

Measures  from  existing  research  were  deemed  inadequate  because  such 
measures  were:     (a)  unlikely  to  have  any  definite  theoretical  relationship 
to  specific  piloting  behaviors  of  interest  in  pilot  trainee  selection 
research,  (b)  unlikely  to  provide  sufficiently  reliable  measures  of 
individual  differences,  and  (c)  unlikely  to  employ  defensiKLe  commanded 
performance  values  as  measurement  standards  and  tolerances  without  a 
detailed  verification  study.     Cons equen tally,  the  approach  in  this  inves- 
tigation was  to  derive  an  index  of  performance  skill  from  a  theoretical 
model,  to  establish  a  statistical  basis  for  the  data  reduction  process, 
and  to  develop  a  context  free  scoring  procedure. 

To  test  the  operational  feasibility  of  the  data  reduction  methods, 
a  set  of  computer  programs  was  developed  to  simulate  selected  aspects  of 
time  series  data  as  they  might  appear  in  piloting  performances.  The 
coinputeT  programs  were  used  to  construct  theoretical  samples  of  piloting 
behavior  in  a  set  of  time  series  data.    The  resulting  time  series  were 
analyzed  with  specially  developed  computer  programs.     The  output  of  the 
anal3^sl>2  was  evaluated  in  terms  of  the  degree  to  which  it  recovered  the 
known  patterns  of  variation  inserted  with  the  simulation  program.  The 
purpose  of  this  paper  is  to  describe  the  data  reduction  and  scoring 
procedures  and  their  supporting  rationale.    Some  outcomes  from  the  analyses 
of  the  simulated  time  series  data  are  presented  to  Illustrate  the  data 
reduction  process. 


The  Measurement  Problem 

Operationally,  the  objective  measurement  of  piloting  performances 
consists  of  three  major  steps.     First,  an  Interval  or  event  sampling 
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procedure  Is  used  to  obtain  a  comprehensive  time  serle     ^^  -^^i  orf  the 
performance.    Second,  the  time  series  record  is  coader^.W  ^  Tsec 
of  summary  statistics.    Finally,  the  summary  stat±stfc«  are.  -^l^d 
into  values  which  indicate  the  level  of  excellence  ^^Zt  ^e 
performance.    Hie  major  difficulty  In  this  operatloi^  Wfl'xw^^ias  been 
the  need  for  gener^  procedures  to  accomplisTthe  s^d  ^^^d  st^s. 

The  measurement  problem  exists  because,  MstorisartUy^  all  -fiaree 
^ps  were  Integra^  by  an  Instructor  or  s^^ardi^^-Z^UoZ^ll  a 
aingle  procedure  which  yielded  a  performance  rating         S.,  ff^^^di:t 
?K    L'?'  ^^^^l"?-"^  °f  flight  simulators  and  Infil^ht"  jA^a^r^rT 

data.    Although  modem  electronic  rechnology  has  fxeec  re«cher 
from  dependence  on  the  pilot  for  his  data  ccllectinn,  is  ntill  a 

"^f  .^r/^^"""^'^"^         supporting  software  t:.  acccnn^:         che  4ond 
and  third  steps  of  the  measurement  process. 


Data  Summarization 


In  solving  the  second  step,  data  summarization,  .  t,^  ^fetadard 

Sr^lde^l  ITV"".  ^5'^^%-"-SinS  of  dlfferen:ceB  .^.w-en^served 
and  an  ideal  or  standard  performance.    These  average  v»iue*s  at^.  inadecniate 

the'  :Se™'  f'd'eil°"M''"r'  '"'^  obliterate  essentl.2  t^ffor^Ln 
i  i/^n^v      '^^^^tions  from  standard  in  the  performan^^  (M^op  & 

deJcrL Jn^    J    performance.    Knoop  and  Welde  discc^  -        that  textbook 
descriptions  of  spveral  maneuvers  were  not  accurately  cted  in  the 

performances  of  highly  experienced  pilots.    Consequet       ■      it  would  be 
desirable  to  have  a  data  summarization  procedure  tha<  ^  r-ontext 

tree,  i.e.,  independent  of  an  ideal  or  standard  perf.         ,  ,  iwd  which 
±T:L''^^T^  features  in  any  arbitrary  pen       aoq*.  As  described 

in  the  next  section  appropriate  method  of  determinli- t)^  p^er  degree 
af  Iti.  T.  With  the  method  of  polynomial  regr*.  J,^  .-a  .axiiieve  part 

?fj  ^  The  objective  .      ^  .  entLrely  satis- 

fied if  the  polynomial  analysis  is  extended  by  the  m.  ,  of  Fourier 
analysis  when  appropriate.  rourxer 


Polynomial  Regression 

The  objective  of  the  present  data  sumnarizatloi:  rr-.  is  to 

capture  all  the  worthwhile  information  in  a  tlme-ser-5s  ecord  without 
redundancy.  Operationally,  the  corresponde=ice  betwe  .ar  .  iformatlon  and 
variance  can  be  exploited  to  achieve  a  part  of  the  fes.xd.  result  (see 
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Wsamr    1967  for  one  definition).    An  orxthogonal  polynomial  routine  with 
^-m^pslse,  forward  fating  recursively  defined  algorithm  cei  be  used 
«^5±Er3!jethod  of  extr^ting  infon^iom  (Conte  &  DeBoor,  1972).  With 
^  ^-nrdiogonal  method,  ±t  is  reasonsBle  to  assess  the  percent  of  variance 
..^omted  for  with  each  newly  fitted  isriii  as  one  criterion  to  determine 
2  optimum  degree  of  fit  (Seber,  12977'  . 

  criterion  for  determining  optiinmm  degr^oe  of'fit  is  that  any 

teem  account  for  at  least  2  percent  of  tfae  r.-  is.    variance.    Cohen  (1978) 

^^ests  that  a  ter«n  whsich  accounts  fo-  at  least  3%  of  the  variance  can 
b».  ^agiected  to  have  practical  value  in   iveAi .    Icot..    With  this  criterion 

_'.fca=alysis  proceeds  until  a  given  teirn.  f*il..  to  account  for  the  minimum 
fflaonc  of  variance.    In  applying  the  forward  -  5ol-Jtion  with  arbitrary 
a»at^^ere  is  one  cautionary  note.  ^eiives  that  it  io  possible 

&ri*rta»  variance  associated  with  the  oafiim-  verei  terms  to  be  very  small 
if  dre  shape  of  the  series  is  nearly  s,-yain»et  .  Ic  ZEigure  1),  and  the  variance 
--Hrmay  incorrectly  fail  under  such  c±rcn=mr5:am=s.     In  this  case,  an  odd/ 
e  /er  test  of  the  terms  tmmber  in  the  ans^^v^^s  csa  determine  the  need  to 
«jtcend  the  analysis  at  least  one  addxticarx*::  stepc. 

Aside  from  the  cautionary  note,  the  polynomial  regression  may  not 
alttrays  extract  all  the  worthwhile  information  in  an  arbitrary  s-t  of 
Qj«£3  because  of  the  minimum  variance  crirerion.     If  the  residual  variance 
insufficiently  large,  it  is  possible  tfest  it  still  contains  worthwhile 
izn^ormation.     The  method  of  mean  square  successive  differences  (Bloomfield 
.,/6)  can  be  applied  as  an  alternative  criterion  to  test  the  hypothesis 

information  in  the  residual  variance.     The  method  of  mean  square  successive 
d_rferences  takes  into  account  the  fazt  that  time  series  data,  as  from  an 
aircraft  s  flight  path,  are  likely  to  exhibit  a  significant  degree  of 
correlation  between  adjacent  values  (Tigure  2). 

The  mean  square  successive  diffe-ence  Is  easily  obtained  at  each 
step  of  the  polynomial  analysis  by  co-,iputing  the  squared  difference  between 
eacn  successive  adjacent  value,  summlTn«  the  result  and  averageing  the  sum 
over  the  number  of  degrees  of  freedoms    (n  -  1).     This  value  is  then  used  with 
the  residual  variance  to  compute  a  staacdard  normal  deviate,  z-score.     If  the 
resulting  z-score  fails  to  yield  a  significant  difference,  the  residual 
data  are  considered  to  represent  a  ranaomly  sampled  distribution  and 
the  analysis  is  terminated.     The  analysis  proceeds  as  long  as  the  z-score 
Indicates  that  the  residual  data  contains  information,  i.e.,  the  slquence 
is  nonrandom  because  it  is  serially  correlated. 

Suppose  the  z^-score  indicates  that   the  residual  data  contains  worth- 
while Information  but  the  percent  of  vaiTLtance  in  the  polynomial  regression 
has  fallen  below  the  minimum  criterion  of  2%.     The  method  of  Fourier 
analysis  may  then  be  employed  to  extract  -information  about  any  periodic 
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functions  present  In  the  residual  data  (Bloomfleld,  1976).    This  nethod 
will  extract  short  term  periodic  patterns  In  the  data  and  as  with 
polynomial  regression,  the  Fourier  coefficients  are  orthogonal.  Because 
of  orthogonality,  analysis  of  variance  methods  can  be  employed  to  Interpret 
the  contribution  of  the  periodic  terms  to  the  data  analysis.    In  parti- 
cular, each  coefficient  will  account  for  soub  proportion  of  the  residual 
variance.    Thus,  It  Is  possible  to  apply  the  minimum  variance  criterion 
and  a  standard  F-test  to  determine  the  practical  value  and  the  statistical 
significance  of  each  Fourier  term. 

In  summary,  the  output  from  this  recommended  data  summarization 
process  will  be  a  sequence  of  polynomial  coefficients  with  an  associated 
proportion  of  variance.    In  addition,  for  some  analyses  there  will  be  one 
or  more  significant  coefficients  for  the  corresponding  periodic  Fourier 
frequencies  and  each  of  these  terms  will  also  have  Its  proportion  of 
variance.    This  approach  to  data  sxmimarlzatlon  has  three  distinctive 
virtues.    Oae  virtue  Is  the  capability  to  reconstruct  the  major  features 
of  the  time  series  record  from  the  given  polynomial  and  Fourier  coefficients, 
as  Is  shown  In  Figure  3,  I.e.,  the  method  captures  the  essential  Informa- 
tion In  the  data.    With  this  reconstructive  capability.  It  Is  unnecessary 
to  retain  the  massive  set  of  original  data.    A  second  virtue  of  the  method 
Is  Its  ability  to  describe  an  arbitrary  set  of  data.  I.e.,  It  Is  context 
free  In  that  the  data  analyst  Is  not  required  to  postulate  an  Ideal  or 
standard  performance  In  advance.    For  very  long  sequence  of  time  series 
data,  the  analyst  merely  applies  the  methods  of  piecewlse  analysis  by  j 
breaking  the  data  Into  segments  (Conte  and  DeBoor,  1972;  Seber,  1977).  I 
Treated  In  greater  detail  in  the  next  section,  the  third  virtue  of  the  ■ 
method  is  the  Interpretablllty  of  the  extracted  variance  patterns  as  an 
Indicator  of  performance  excellence. 


Performance  Evaluation 

The  third  step  of  the  measurement  problem  is  to  evaluate  results  from 
the  data  summarization  as  an  indicator  of  performance  excellence.  While 
the  interpretation  procedure  is  easily  described,  its  inherent  value 
depends  on  an  inference  discussed  in  the  next  section  about  the  source 
of  variance  in  the  time  series  data  record.    One  interpretation  nethod 
is  8i]iq>ly  to  plot  the  variances  associated  with  each  coefficient  in  the 
polynomial  and  Fourier  analyses  in  the  order  extracted  by  the  analysis 
(Figure  4)  or  as  a  function  of  the  Fourier  frequencies  (Figure  5).  The 
problem  is  to  determine  the  meaning  of  the  patterns  which  might  be  esdiibited 
by  such  a  plot. 


The  concept  of  coin)lexlty  is  used  here  to  evaluate  the  degree  of 
excellence  reflected  in  the  pattern  of  plotted  variances.  Complexity, 
as  used  here,  derives  its  meaning  from  a  consideration  of  the  internal 
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strumore  of  a  set  of  time  series  data,  e.g.,  an  aircraft's  flight 
path*    Complexity  is  closely  associated  with  the  notion  of  serial  corre- 
laticBu    A  conqplex  performance  will  exhibit  relatively  large  deviations 
betwoen  adjacent  or  closely  related  values  in  time.    In  contrast,  a 
simpflker  performance  should  have  very  few  such  deviations.    By  extension, 
it  cm  be  shown  mathematically  that  a  complex  performance  will  exhibit 
more  short-texm  trends  and  periodic  terms  than  a  sinpler  performance. 
(Bloomfield,  1976).     It  follows,  then,  that  a  complex  performance  should 
require  more  analytic  tevm  to  achieve  a  given  level  of  information 
extraction. 

The  key  to  the  translation  of  complexity  from  the  plotted  variances 
is  the  orthogonal  basis  of  the  data  analysis.    In  the  analysis,  the  base 
for  determining  the  percent  of  variance  is  the  total  observed  variability 
within  that  performance.    In  a  forward  analysis  with  orthogonal  polyno- 
mials each  successively  fitted  term  accounts  for  proportionally  less  of 
the  total  variance  (Seber,  1977).    Consequentally,  an  analysis  which 
requires  many  terms  will,  in  general,  reveal  a  smaller  average  variance 
over  the  number  of  terms  fitted.    Thus,  number  of  terms  and  average 
percent  variance  should  offer  a  useful  index  of  degree  of  complexity.  An 
objective  interrpretation  of  degree  of  complexity  would  employ  a  statistical 
modelling  appxoach  to  fit  the  resultant  plot  of  variances. 


Conq>lexity  and  Piloting  Behavior 

Level  of  experience,  i.e.,  knowledge  and  skill  in  aircraft  control 
can  be  linked  to  differences  in  degree  of  complexity.    Kelley  (1968) 
argues  that  the  experienced  pilot  is  readily  able  to  convert  his  assigned 
mission  into  a  projected  flight  profile,  to  anticipate  the  control 
movements  needed  to  achieve  that  profile  in  a  timely  fashion,  and  to 
easily  detect  and  correct  minor  errors  of  execution  or  random  perturbation 
in  aircraft  performance.    In  short,  it  seems  reasonable  to  characterize 
the  aircraft  control  produced  by  an  experienced  pilot  as  generally  smooth 
and  regular,  i.e.,  emibiting  a  high  degree  of .  serial  correlation,  sinpie 
structure,  and  a  low  degree  of  con^lexlty. 

By  contrast,  the  performance  of  the  novice  aviator  should  be  rough 
and  irregular,  i.e.,  complex  in  structure.    The  novice  has  yet  to 
acquire  the  necessary  skills  and  knowledge  associated  with  aircraft 
control.    Unable  to  project  the  desired  flight  path  sufficiently  into 
the  future,  there  will  be  many  errors  of  omission.    Unable  to  execute 
well  integrated  control  movements,  there  wiU  be  many  errors  of  commission. 
In  short,  the  novice  expends  a  great  deal  of  time  and  energy  attempting 


to  dampea  out  his  own  errors »  frequent ally »  without  regard  to  the  accom- 
plishinenit  of  the  overall  objective.    Nevertheless,  as  the  novice  gains 
experienmsy  the  resulting  performances  should  exhibit  a  steady  progression 
from  greater  complexity  to  greater  simplicity  as  learning  occurs.    In  a 
later  report,  data  from  a  tryout  experiment  with  the  job-saraple  test 
will  be  used  to  evaluate  the  validity  of  the  hypothesis  that  degree  of 
complexity  differentiates  among  performances  of  pilots  at  different  levels 
of  experience. 
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During  1977,  without  exception,  units  evaluated  by  the  80th  Maneuver  Training 
Command  failed  to  demonstrate  a  capability  to  collect,  synthesize  or  react  to 
Intelligence  Interjected  into  ARTEPs.  CPXs  or    TXs      This  falling  across  all  39 
Battalion  size  elements  prompted  the  authors  pose  a  system  to  measure  the 

extent  of  this  deficiency  by  looking  at  each  at  influenced  a  unit's  ability 

to  develop  combat  intelligence. 

The  first  task  was  to  Isolate  where  the  problem  lay.    This  meant  structuring 
the  intelligence  information  flow  in  a  manner  that  permitted  assignment  of 
responsibility  for  collecting,  processing  and  reporting  a  given  piece  of  infor- 
mation.   This  chosen  structure  was  based  on  the  following  data  transfere  points: 


Indicator         Co.  Level  Bn  Cp  Bn  S-2  Bn  Co.  or  other 

Collection  agency 

1  1  1  1  1 


2  2  2 

3  3  3 
I                     II  III  IV 

Data  Transfere 
Points 


2  2 

3  3 
V 


i 


Of  necessity,  the  measures  used  were  output  not  process.     This  meant  the  application 
of  an  absolute  standard  that  was  recognized  by  all  concerned.    For  this  standard 
two  sources  were  used:     Soldier  Manual  (SM)  Tasks  levels  1  through  5  for  Enlisted 
Men  and.  for  the  Officers,  the  Infantry  School  Intelligence  Training  Objectives 
for  Battalion  S-2s.    This  single  standard  was  used  for  Officers  regardless  of 
unit  type  or  Officer  Basic  Branch. 

Information  inputs  were  selected  from  a  pool  of  approximately  300  graphic 
and  vritten  Soviet  force  indicators.    Each  indicator  supported  a  Soviet  doctrinaire 
procedure  and  was  aggregated  to  present  current  Soviet  ground  force  doctrine.  The 


i 


^||||leclslon  to  use  this  approach  on  how  the  intelligence  picture  should  be  acquired 
was  based  on  the  authors  and  USAINTS  Collective  Training  Branch  biases.  Infor- 
mation from  higher  was  expected  to  be  sketchy,  incomplete,  and,  if  in  written  form, 
well  nigh  historical.     The  authors  envisage  an  environs  nt  in  which  many  of  the 
technical  collection  capabilities  will  be  severely  impaired.    The  emphasis  must 
be  on  combat  intelligence  and  maximum  utilization  of  organic  resources,  i.e.,  the 
troops.     Combat  intelligence  is  defined  as  an  intelligence  for  the  engagement  of 
the  moment.    There  are  two  aspects  of  information  originating  from  below: 

1.  Where,  in  relation  to  your  position,  is  it  being  obtained? 

2.  Is  it  a  planned  or  accidental  observation? 

The  prize  goes  to  the  S-2  that  is  acquiring  his  information  well  forward  of  his 
position  through  planned  observation. 

Prior  to  the  exercises  the  OPORDs  were  reviewed  to  determine  intelligence 
^requirements.    Once  identified  these  requirements  served  as  focus  points  that 
ensuing  indicators  clarified  as  the  problem  progressed.     The  clarity  capable  of 
being  obtained  by  the  S-2  depended  on  the  relative  importance  of  this  intelli- 
gence requirement  Jn  relation  to  the  unit  mission.     This  was  an  arbitrary  decision 
on  the  part  of  the  evaluator. 

How  fast  this  clarity  is  obtained  was  dependent  on  the  actions  taken  by  the 
S-2,  i.e.,  what  he  included  in  his  collection  plan.     The  first  point  is  whether 
he  developed  enough  information  to  determine  when  the  Soviets  would  attack, 
where  they  would  attack,  and  in  what  strength  they  would  attack.     The  second  and 
critical  point  is  whether  he  told  the  Bn/Task  Force    Commander  in  time  for  the 
Commander  to  react. 

The  application  of  this  to  Battle  Simulation  at  Bn  and  Bde  level  (Pegasus 
|and  CAMIdS)  FTX's,  CPX'x,  TEWT's  and  ARTEP's  is  as  follows: 

W        The  S-2  of  Bn/Bde  in  a  defensive  position  receives  an  intelligence  estimate. 
His  responsibility,  as  defined,  was  to  identify  gaps  in  it  according  to  his 
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units  mission  and  develop  a  collection  plan  that  filled  those  gaps.    Where  he  did 
this  he  received  a  comprehensive  view  of  the  enemy  capability/disposition  in  the 
rear  two-thirds  (2/3)  of  Zone  II.    This,  while  not  in  strict  accordance  with  the 
doctrinaire  capabilities  of  his  supporting  intelligence  collection  agencies, 
forced  Bn/Bde  S-28  to  depend  on  direct  observation  for  the  bulk  of  their  informa- 
tion. 

Once  a  determination  was  made  as  to  what  to  portray  based  on  OPFOR  scenario, 
the  problem  was  to  select  the  series  of  indicators  to  use.     Using  US  Army-Europe 
identification  guides  over  200  graphics  for  OPFOR  vehicles  and  weaponry  were 
developed.     These  graphics  were  deliberately  made  difficult  to  interpret  by 
xeroxing  and  then  presenting  them  at  difficult  angles  for  identification.  As 
each  problem  progressed,  each  indicator  became  increasingly  easier  to  identify 
through  better  reproduction  and  better  angles  of  presentation.     The  graphics 
were  used  for  actual  observations,  i.e.,  front  line  troops  or  RECON  elements  in 
an  ARTEP,  Company  Commanders  in  a  CPX. 

In  addition  to  the  graphics  each  intelligence  requirement  was  supported  by 
a  series  of  just  over  the  horizon  spot  repurts/indicators.     some  of  these  spots 
were  from  refugees,  tracer  patterns,  SAM  launchings,  noises,  dust,  detrious, 
flashes,  shell  holes,  etc.     Again,  the  emphasis  was  on  the  ability  to  identify 
and  pass  on  significant  information,  whether  it  arrived  at  Battalion,  and, 
finally,  what  happened  to  it  at  Battalion  level.     Additional  indicators  were 
abailable  for  technical  reports  from  higher  if  requested  as  part  of  the  S-2's 
collection  plan,  i.e..  Side  Looking  Airborne  Radar  (SLAR)  reports,  USAF  Infra- 
red (IR)  reports,  ASA  reports,  etc. 

This  technique  allowed  the  determination  of  the  extent  of  any  deficiency, 
where  phe  deficiencies  were  located,  and  where  the  Commander  needed  to  focus  his 
training  to  overcome  the  deficiency.     Through  an  informal  arrangement  with 
Individual  Test  Evaluation  Directorate  (ITED)  of  the  US  Army  Training  Support 
Center,  where  pertinent  Soldier  Manual  (SM)  skills  are  included  as  part  of  a 
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l^clll  Qualification  Test  (SQT)  ,  the  Commander  can  also  be  told  how  his  troops 
stack  up  In  relation  to  his  active  Army  counterpart. 

As  a  review,  the  following  sequence  would  be  typical: 

1845  hours:    An  OP  spots  two  OPFOR  Armored  vehicles  wheel  out  of  and  back 
into  the  opposite  woodllne.     Total  time  of  observation — 3  seconds.    What  are  the 
op's    responsibilities?    The  Infantry  Soldier's  Idanual  (FM  7-11)  says  he  should 
be  capable  of: 

1.  SM  Task  071-HA-0803  -  Report  activity  using  SALUTE 

2.  SM  Task  071-11A-0806  -  Identify  Soviet  vehicles 

3.  SM  Task  071-11A-0802  -  Speed  captured  documents  to  rear 

This  is  the  first  Measurement  Point,  i.e.,  what  is  the  OPs  reaction  and 
what  does  he  report.     In  an  ARTEP  the  Platoon  or  Company  Evaluator/Controller 
M|actually  shows  a  flash  card  to  a  troop  in  the  field.     In  a  CPX  a  board  controller 
^does  the  same  to  the  Company  Commander. 

The  second  Measurement  Point    is  what  comes  through  to  Battalion,  i.e.,  to 
what  extent  does  the  Platoon  leader  and/or  Company  Commander  act  as  an  inhibitor 
to  the  flow  of  intelligence  related  information. 

The  spot  report  of  the  two  armored  vehicles  upon  arrival  at  Battalion  is 
the  responsibility  of  the  Intelligence  Sargeant  who  should  be  capable  of: 

1.  SM  Task  071-11B-8111  -  Update  enemy  situation  map 

2.  SM  Task  071-11B-5430  -  Maintain  intelligence  workbook 

As  these  and  other  indicators  come  in  they  begin  to  form  a  picture  of  the 
intentions  of  the  OPFOR.     The  next  Measurement  Point  was  what  conclusions  did 
the  S-2  draw  and  what  did  he  report  to  the  Battalion  or  Task  Force  Commander. 
The  original  indicator  was  reintroduced  at  Battalion  level  if  it  was  lost  at  any 
^easurement  Point. 
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The  use  of  this  technique  for  information  derived  from  subordinates  measured 
several  things: 

In  an  ARTEP:    !•    Did  the  troops  demonstrate  competency  in  SM  Skills: 

a.  SM  Task  071-11A-0803  -  Report  activity  using  SALUTE 

b.  SM  Task  071-11A-0806  -  Identify  Soviet  vehicles 

c.  SM  Task  071-11A-0802  -  Speed  captured  documents  to  rear 

In  a  CPX:         1.    Did  the  Company  Commanders  demonstrate  competency  in  SM  skills: 

a.  SM  Task  071-11A-0803  -  Report  activity  using  SALUTE 

b.  SM  Task  071-11A-0806  -  Identify  Soviet  vehicles 

c.  SM  Task  071-11A-0802  -  Speed  caput red  documents  to  rear 
ARTEP/CPX:        1.     The  extent  to  which  the  Company  Commander  inhibits  the  flow 

of  information  through  his  own  ignorance. 
'2.     Did  the  Bn/Bde  Intelligence  SArgeant  demonstrate  amony  others 
compentency  in  SM  skills: 

a.  SM  Task  071-11B-8111  -  Update  enemy  situation  map 

b.  SM  Task  071-11B-5430  -  Maintain  intelligence  workbook 

3.  Did  the  S-2  correctly  interpret  the  information  provided? 

4.  Were  the  S-2  recommendations  to  the  Commander  timely  and 
concrete? 

For  information  derived  from  higher  Headquarters,  questions  can  be  asked  re- 
lated to: 

ARTEP/CPX:        1.     Did  the  S-2  correctly  identify  gaps  in  the  intelligence 
estimate  when  compared  to  his  units  mission? 

2.  Did  S-2  provide  guidance  for  development  of  collection  plan? 

3.  Did  the  Intelligence  Sargeant  demonstrate  competency  in: 

a.  SM  Task  11B=5451  -  Extract  and  use  i-nformation  from 
Intelligence  estimate 

b.  SM  Task  llB-5470  -  Prepare  intelligence  collection  plan 
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P  c.    SM  Task  llB-5472  -  Prepare  Patrol  Plan 

d.    SM  Task  llB-5473  -  Debrief  patrols 
In  an  ARTE?:    1.     Did  the  Platoon  Sargeant  demonstrate  competency  in: 

a.  SM  Task  llB-8305  -  Plan  and  conduct  zone  Recon  missions 

b.  SM  Task  llB-8320  -  Plan  and  conduct  area  Recon  missions 
It  should  be  noted  that  these  examples  were  only  a  portion  of  the  required 
Solider's  Manual  Tasks  reviewed. 

The  manning  requirements  for  this  technique  were: 
.  CPX:    One  individual  with  the  Company  Commanders  to  present  graphic  and 
written  indicators  and  one  person  with  the  Bn/Bde  S-2  to  record  what  comes  in  . 
and  reintroduce  indicators  that  fell  through  the  cracks. 

ARTEP:    Platoon/Company  evaluators  were  given  packets  of  indicators  to  be 

^presented  to  randomly  selected  troops  between  the  hours  of    to    on  day 

  depending  on  the  scenario.     One  person  with  the  Bn/Bde  S-2. 

Because  of  the  very  limited  number  of  scenarios  addressed  in  the  ARTEPs 
and  battle  simulations  and  the  availability  of  an  absolute  standard  for  Enlisted 
Solider  Manual  tasks  the  validity  of  the  technique  was  well  grounded. 

The  design  used  was  a  compounded  posttest  only  control  group  design.  To 
illustrate  this  design  graphically,  the  "R"  represents  random  selection  of 
units,  "X"  represents  the  administration  of  the  ARTEP  or  Battle  Simulation  and 
"0"  represents  the  administration  of  the  indicator  packets  by  the  observer. 
It  is  important  to  remember  that  "X"  and  "0"  are  occuring  simultaneously. 

Step  1  consisted  of  stratified  initial  random  selection  of  the  first  group, 
exposing  this  group  to  the  X^  (ARTEP/Battle  Simulation)  and  the  measuring  of 
the  results  with  the  Criterion  Referenced  Test,  0^^  (a  packet  of  Soviet  force 
indicators).    This  led  to  a  composite  design  that  could  be  graphically  depicted 
as: 
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Group  1       R  XjO^ 
Group  2       R  X2O2 
Replication  of  Step  1  by  unit  2  provided  a  control  group  for  those  unmodified 
portions  of  X-j^  present  In 

A  third  unit  was  then  selected  and  exposed  to  the  Indicator  packets.  This 
replication  using  nine  (9)  units  led  to  a  design  that  could  graphically  be 
portrayed  as: 

Group  1  x,0, 

X  X 

Group  2  X2O2 
Group  3  X3O3 

Group  9  XgOg 
The  single  subject  posttest  design  was  selected  by  the  authors  for  its 
appropriateness  for  the  project  and  ease  in  administration.    The  model  dealt 
with  the  following  threats  to  internal  validity: 

1.  History:    History  becomes  a  plausable  rival  hypothesis  when  specific 
events  occurlng  between  XiOj^,  X2O2,  and  X3O3  could  be  interpreted  as  causing 

a  decrease  in  errors  noted  between  0^^,  O2  .  .   .O9.    The  probability  that  events 
that  would  Influence  population  in  all  units  in  a  similar  manner  and  at  the 
same  point  in  time  are  remote. 

2.  Maturation:    This  can  be  ruled  out  because  of  the  short  period  of  time 
required  to  run  the  study  for  each  unit. 

3.  Testing:    This  threat  to  Internal  validity  can  be  ruled  out  as  a 
possible  rival  hypothesis  because  retesting  of  the  same  unit  did  not  occur.  The 
reactive/obtrusive  nature  of  the  indicator  packets  was  not  controlled. 

4.  Statistical  Regression:  This  would  not  become  a  rival  hypothesis  be- 
cause the  selection  of  the  participants  for  each  "X"  and  corresponding  "O"  was 
randomly  made.  1  O*-^  » 
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^1  5.    Instrumentation:    This  threat  can  be  ruled  out  by  the  nature  of  the 

^^ndicator  packets. 

6.  Bias  resulting  from  differential  selection:    This  can  be  ruled  out  as 

a  possible  rival  hypothesis  because  of  randomly  selecting  the  branch  of  the  units. 

7.  Experimental  mortality  (loss  of  a  portion  of  the  experimental  popula- 
tion) did  not  occur. 

8.  Since  both  selection  and  maturation  can  be  ruled  out,  the  threat  posed 
by  a  selection  maturation  interaction  can  be  disregarded. 

The  factors  that  are  a  threat  to  external  validity  are,  unfortunately, 
not  as  easily  dealt  with. 

1.  The  reactive  or  interaction  effect  of  the  packets  was  eliminated  as  a 
plausable  rival  hypothesis  by  the  nature  of  the  research  design  which  did  not 
call  for  a  pretest.     However,  the  obtrusive  nature  of  the  packet  administrator 

^^ay  have  had  an  unmeasured  effect. 

2.  The  interaction  effects  of  selection  and  the  experimental  variable, 
i.e,  the  ARTEP/Battle  Simulation  cannot  be  ruled  out.     The  use  of  an  all  male 
population  as  opposed  to  a  male  and  female  population  could  conceivably  have 
biased  the  results.     It  was  felt  that  the  placement  potential  for  females  in 
the  combat  arms  units  below  Battalion  level  did  not  justify  the  expense  of 
including  them. 

3.  Reactive  effects  of  the  experimental  arrangement  remains  an  open 
question.    The  limited  population  that  was  available  necessitated  the  application 
of  the  indicator  packets  to  all  members  of  the  population.    Whether  similar 
results  would  be  obtained  outside  of  the  experimental  setting  is  not  known.  The 
effect  of  the  observer  monitoring  the  operation  was  not  measured. 

4.  The  threat  posed  by  multiple  treatment  interference  is  the  most  serious 
^threat  to  this  study.     The  use  of  a  single  population  to  assess  the  effectiveness 

of  both  Soviet  defense  and  offense  packets  (the  two  packets  that  were  used  with 
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all  units)  was  unfortunate  but  unavoidable  due  to  the  size  of  the  available  pop- 
ulation.   The  exposure  of  the  population  to  the  same  technique  two  or  more  times, 
i.e.,  the  use  of  spot  reports  and/or  technical  reports  in  conjunction  with 
pictures  of  each  indicator  series  permits  the  possible  interpretation  of  results 
on  the  Soviet  defense  packet  being  a  function  of  exposure  to  the  same  technique 
of  presentation  that  the  unit  experienced  earlier  with  Soviet  offensive  packets. 
Other  special  packets  (Airborne,  AAA,  River  crossing,  etc.)  always  followed  these 
first  two,  if  used  at  all. 


Data  Analysis 

The  results  of  the  study  were  dealt  with  in  two  ways; 

1.  The  average  percentage  of  errors  was  computed  for  each  unit 

2.  The  average  percentage  of  errors  was  computed  for  each  SM  Task  or 
Intelligence  Training  Objective. 

The  measure  of  SM  Task  competency  for  levels  1-2  and  for  levels  4-5  presents  an 
incredible  picture. 

For  Skill  Level  1-3 


SM  Task  Errors 

071-11A-0802  5 
"Retrograde  cap- 
tured Documents" 

071-11A-0803  22 
"SALUTE" 

071-11A-0806 

"ID  Soviet  23 

vehicles (RECON)" 

"1st  Echelon 

Soviet"  23 

"1st  Echelon 

WARSAW"  26 

"Equipment  above 

Regiment"  25 

071-11A-0802 

"OCOKA"  27 


No.  of  Possible 
Responses  

11 
27 

27 

27 
27 
27 

27 
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Percentage 
of  Errors 


45% 
81% 

85% 

85% 

96% 
92% 

100% 


X  =  21.57 


®X  =  7.52 


83.43  X  error  percentage 


The  nature  of  the  deficiency,  while  well  known,  was  not  evisaged  as  being  as 
cute  as  the  data  demonstrated 

The  problem,  while  statistically  less  acute  for  skill  level  4-5,  has  greater 
potential  for  degrading  the  overall  capability  of  the  unit. 


SM  Task 


Errors 


071-11B-8111 
"Update  situation 
map" 

071-11B-8112 
"Preparation  of 
situation  report" 

071-11B-8131 
"Immediate  air 
request" 


i 


71-11B-5423 
^  Preparation 
of  Overlays" 


071-11B-5430 
"Establish  and 
Maintain  Intel- 
ligence workbooks' 

071-11B-3451 
"Extract  Intel- 
ligence Data  from 
Intel  Estimate" 

071-11B-5470 
"Prepare  Intel- 
ligence Plan" 


0 


For  Skill  Level  4-5 

No.  of  Possible 
 Responses 


Percentage  of 
Errors  

14% 


0% 


85% 


0% 


85% 


100% 


85% 


X  = 


3.71 
3.20 


52.71  X  error  percentage 


The  unfortunate  conclusion  is  that  only  in  mechanical  skills  can  the  NCO's 
at  level  4-5  demonstrate  any  consistent  proficiency.     The  skills  related  to 
^ndividual  interpretive  capabilities  and  skills  requiring  the  demonstration  of 
initiative  are  those  with  the  lowest  scores.     This  seeming  inability  to  act 
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decisively  in  the  absence  of  Imaginative  leadership  came  as  a  very  real  surprise. 
It  was,  frankly,  a  preconceived  notion  on  the  part  of  the  authors  that  senior 
NCOs  would  take  up  the  slack  left  by  poorly  trained  company  grade  officers.  This 
simply  did  not  happen. 
/  The  presentation  of  the  same  requirement  to  troops  In  an  ARTEP  and  Company 

Commanders  In  a  Battle  Simulation  provided  a  unique  opportunity  to  compare  the 
capability  of  Company  Commanders  with  that  of  their  subordinates.    Again,  the 
results  were  disheartening. 

No.  of  Possible  Percentage 
Errors   Responses   of  Errors 

Individual  Soldier 

Response  34  36  94% 

Company  Commander 

Response  189  209  90% 

The  SM  Task  that  requires  the  reporting  of  observed  phenomenon  (SM  Task  071- 
llA-0803)  showed  the  troops  to  be  almost  as  proficient  as  the  Company  Commanders 
in  including  all  aspects  of  the  required  communications. 

No.  of  Possible  Percentage 

Errors   Responses   of  Errors 

Individual  Soldier 

Response  4  6  66% 

Company  Commander 

Response  18  21  86% 

If  the  only  players  in  the  game  were  individual  soldiers.  Company  Commanders 
and  Intelligence  Sargeants  then  the  picture  would  Indeed  be  grim.  Unfortunately, 
the  compounding  effect  of  Ignorance  does  not  stop  here.    The  last  link  in  the 
intelligence  chain  is  the  Bn  S-2.    To  assess  his  ability  to  deal  with  adequate 
Inputs  from  other  members  of  the  chain,  the  authors: 
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1.  Reintroduced  information  that  had  been  lost 

2.  Used  the  Intelligence  Training  Objectives  of  the  Infantry  School  as 
criteria. 


The  results  were  appalling. 


Intelligence  Objectives 
I  004 

"Determine  EEl" 
I  003 

"Disseminate  Combat 
Intelligence  and 
Information" 

I  Oil 

"Identify  OPFOR 
actions  through 
indicators" 

I  022 

"Develop  Collection 
Plan" 

I  029 

"Analyze  Doctrine  and 
Tactics  Employed  by 
OPFOR" 


I  030 

"Identify  OPFOR  support 
training  weapons  from 
Regiment  down" 

I  033,  37,  38,  39,  48 
"Identify  sensors  (any) 
required  for  mission" 

I  041 

"Detect  threats  to 
Bn/TF  Security" 


X  =  6.31 


=  .74 


No.  of  Possible 
Errors  Responses 

6  7 

7  7 


Percentage  of 
Error  

86% 


lOOJ 


86% 


100% 


86% 


100% 


71% 


100% 


91.13  X  error 

percentage 
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As  would  be  expected  no  generallzable  finding  across  units,  were  noted. 


1 

2 

A 
H 

Units 

J  0 

7 

Q 
O 

y 

Respondent 
y  %  errors 

Percentage 
errors  across: 

Sol Hlers 

Q/i 

Q/. 

94 

7  categories 

Co.  Cmdrs. 

73 

90 

75 

90 

80 

100 

90 

85 

7  categories 

Pltn.  Sgts. 

100 

100 

100 

1  category 

Intel.  Sgts. 

28 

57 

57 

57 

43 

71 

57 

53 

7  categories 

Bn  S-2 

50 

100 

88 

100 

100 

100 

100 

91 

8  categories 

Unit  X 

53 

82 

73 

97 

97  82 

74 

90 

82 

%  errors 
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In  all  Instances,  the  personnel  being  evaluated  were  given  at  least  three 
opportunities  to  correctly  Identify,  report,  document  or  analyze  the  indicators, 
If  they  were  successful  on  any  of  the  three  tries,  they  received  full  credit. 

The  inability  to  identify  Soviet  vehicles  and  weapons  regardless  of 
echelon  was  noted  across  all  units  for  Company  Commanders  and  troops.  This 
data  contrasts  sharply  with  the  first  iteration  of  the  Infantry  SQT  which 
indicated  that  85%  of  all  soldiers  could  distinguish  Soviet  combat  vehicles. 
The  ease  with  which  Active  Army  troops  seemed  to  pass  the  vehicle  recognition 
requirement  would  lead  us  to  question  that  particular  portion  of  the  SQT. 

The  percentages  speak  for  themselves.    Notably  absent  is  the  end  result 
of  so  little  expertise  being  demonstrated  across  so  many  differing  skills.  If 
the  assumption  is  made  that  without  these  skills  Bn/Bde  size  elements  cannot 
produce  combat  intelligence  then  the  probability  of  the  success  in  a  combat 
environment  is  very  low  Indeed. 

The  essence  of  the  process  was  repeated  measures  on  the  same  information  to 
determine  if  the  information  had  been: 
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1.  Recognized 

2.  Reported 

3 .  Record  ed /disp lay ed 

4.  Interpreted 

3.    Used  to  generate  request  for  additional  information  and/or  recommendations 
for  specific  course  of  action  on  the  part  of  the  Battalion  Commander. 
For  the  U.S.  Army  Reserve  the  answer  is,  unequivocally,  NO. 


I 
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Learning  Aptitude,  Error  Tolerance,  and  Achievement  Level  ^ 
as  Factors  of  Performance  In  a  Visual-Tracking  Task  ■ 

Brian  D.  Shipley,  Jr. 
US  Array  Research  Institute  Field  Unit 
Fort  Rucker,  Alabama 


INTRODUCTION 

The  Army  Research  Institute  Field  Unit  at  the  Army  Aviation  Center 
Is  conducting  aviator  trainee  selection  research  on  job-sample,  psychomotor, 
information  processing,  and  time-sharing  tests  to  Improve  the  methods  of 
selecting  applicants  for  Army  helicopter  pilot  training.     This  paper  presents 
preliminary  results  from  an  investigation  of  methods  to  Improve  the  measure- 
ment of  visual  tracking  and  time-sharing  skill  as  a  part  of  that  research. 
In  this  section,  the  test  is  described,  some  sources  of  confounding  are 
considered  and  methods  to  overcome  the  confounding  are  presented.  Following 
the  introduction,  procedures  are  described  for  collecting  data  to  test 
selected  hypotheses  about  confounding.     Then,  the  results  of  the  data  col- 
lection are  presented  and  the  discussion  section  focuses  on  the  prospects 
for  employing  data  from  the  visual  tracking  tests  in  time-sharing  and 
aviator  trained  selection  research. 

Visual  Tracking  Test 

i 

The  visual  tracking  test  used  in  the  current  research  was  designed  ^ 
to  measure  an  individual's  ability  to  control  an  unstable  system.  The 
test  device  is  a  single  axis,  compensatory  visual  tracking  task  described 
in  Pew,  Rollins,  Adaras  and  Gray  (1977).     The  operator's  task  is  to  try  to 
maintain  a  light  spot  in  the  center  of  a  horizontal  display  using  lateral 
movements  of  a  finger  operated  joy-stick. 

The  test  difficulty  is  controlled  by  the  system  time  constant  in  the 
periodic  processing  of  the  control  stick  signal.    The  system  time  constant 
is  a  weighting  function  which  determines  the  rate  of  change  of  light  spot 
location  in  relation  to  control  stick  movements.     The  system  time  constant 
operates  as  a  divisor  so  that  the  size  of  the  constant  is  Inversely  related 
to  test  difficulty.     The  test  device  periodically  samples  the  control  stick 
signal  and  computes  the  location  of  the  light  spot  as  a  weighted  function 
of  the  present  control  input  and  a  residual  component  from  previous  control 
signals  added  to  the  present  light  spot  location  value.     The  residual  compon- 
ent is  correlated  with  the  operator's  previous  control  behaviors  and  greatly 
increases  the  difficulty  of  learning  effective  control  of  the  light  spot. 

The  tracking  test  device  can  be  operated  in  two  difficulty  modes: 
critical  and  fixed  difficulty  tracking.     The  fixed  difficulty,  or  fixed 
tracking  mode  was  designed  primarily  for  time-sharing  applications.     In  this 
mode,  the  tester  fixes  the  time  constant  at  a  given  value  and  the  operator 
performs  for  a  rixed  period  of  time^    The  measure  of  skill  in  fixed  tracking  ^ 
mode  is  the  total  absolute  deviation  of  the  light  spot  from  the  center  of  I 
the  display,  averaged  across  the  time  of  performance.  ^ 
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The  critical  difficulty,  or  critical  tracking  mode  Is  used  to  estimate 
.  the  operator's  effective  time  delay.    The  effective  time  delay  represents 
the  mlnlmuai  operator  response  time  for  the  detection  and  correction  of 
errors  in  continuous  control  tasks  and  is  used  as  a  parameter  in  human 
Information  processing  and  optimum  control  theory  models  of  operator 
behavior.    Operationally,  the  effective  time  delay  is  an  index  of  the 
amount  of  time  required  for  the  operator  to  detect  an  error  and  to  convert 
information  about  that  error  into  a  prsclse  control  movement.  Estimates 
of  the  effective  time  delay  from  the  critical  tracking  mode  are  employed 
as  the  value  of  the  fixed  time  constant  in  the  fixed  tracking  mode. 

To  Kie.asure  the  effective  time  delay  in  critical  tracking  mode,  the 
test  device  progressively  increases  test  difficulty  as  a  function  of  time 
in  the  performance.    Difficulty  is  progressively  increased  by  systemati- 
cally reducing  the  size  of  the  time  constant  as  a  function  of  time  in 
performance.    As  the  time  constant  grows  smaller,  the  rate  of  change  in 
light  spot  location  per  unit  time  increases.     Eventually,  the  rate  of 
change  in  light  spot  location  becomes  so  rapid  that  the  operator  is 
unable  to  maintain  effective  control,  the  location  exceeds  the  limits  of 
the  display,  and  the  performance  ends.    The  measure  of  skill  is  the  esti- 
mated effective  time  delay  which  is  the  size  of  the  system  time  coniStant 
at  the  end  of  the  performance.     This  investigation  was  designed  to 
evaluate  possible  confounding  effects  in  the  measurement  of  critical 
tracking  skill,  i.e.,  measurement  of  the  effective  time  delay. 

Confounding  Effects 

A  review  of  recent  research  with  the  present  test  (Pew  et  al.,  1977) 
and  two  similar  visual  tracking  tests  (Damos,  1977;  Gopher  &  North,  1974; 
North,  1977;  North,  Harris  &  Owens,  1978)  suggested  that  the  testing 
procedures  had  resulted  in  a  confounding  of  other  performance  factors 
with  the  measurement  of  visual  tracking  skill.     Pew  et  al.  defended  their 
procedures  with  evidence  of  test-retest  reliability  (Rose,  1974). 

In  the  research  with  similar  tests  there  was  evidence  that  confounding 
effects  had  degraded  the  validity  of  the  visual  tracking  data  to  estimate 
time-sharing  capacity  and  would  probably  degrade  the  vai'.idity  of  these 
measures  in  aviator  selection  decisions.    Gopher  et  al.   (1974)  and  North 
(1977)  observed  improvements  in  time-sharing  performance  as  contrasted 
with  predictions  from  single-task  performance.     Gopher  et  al.  offered 
three  hypotheses  which  might  account  for  these  discrepancies:     (a)  Use 
of  adaptive  logic  did  not  accurately  estimate  single-task  tracking  skill;" 
(b)  There  was  an  improvement  of  single-task  tracking  skill  as  a  function 
of  practice  In  the  tine-sharing  test;  and  (c)  There  Is  an  Independent 
time-sharing  skill  which  Is  learned  only  In  practice  with  time-sharing 
tests.     At  the  conclusion  of  his  report.  North  (1977)  suggested  that 
"Isolation  of  Improvement  factors  Is  an  important  direction  for  further 
research"  (p.  92). 
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Two  Investigations  addressed  the  question  of  confounding  sources. 
In  a  transfer  of  training  experiinent ,  Damos  (1977)  found  weak  evidence 
of  improvement  of  both  single-task  and  time-sharing  skill  as  a  function 
of  practice  in  multiple- task  performance.     Indications  of  confounding 
effects  in  the  Damos  (1977)  data  were:     (a)  operator  unreliability  as 
evidenced  by  heterogeniety  of  variance;  and  (b)  failure  of  16.7%  of  the 
subjects,  8  of  48,  to  achieve  minimum  criterion  in  subsequent  time-sharing 
practice. 

Although  not  specifically  addressed  by  the  authors,  some  difficulties 
with  the  use  of  adaptive  logic  to  determine  test  difficulty  were  apparent 
in  the  investigation  of  test-retest  reliability  by  North  et  al.  (1978). 
The  adaptive  logic  was  used  to  establish  tracking  test  difficulty  in  the 
first  part  of  two  daily  testing  sessions.    After  fixing  the  level  of 
difficulty,  the  mean  root-mean-square  (RMS)  tracking  error  was  computed 
as  the  baseline  for  feedback  on  tracking  performance  in  the  time-sharing 
tests.    Table  1  is  a  summary  of  correlations  among  the  tracking  task 
difficulty  and  RMS  tracking  error  scores  across  the  two  daily  sessions 
and  two  days  of  testing. 

It  is  apparent  from  the  data  in  Table  1  that  test  difficulty  corre- 
lates negatively  with  dual-task  RMS  tracking  error.    This  has  potentially 
serious  consequences  in  aviator  trainee  selection  research  because 
individuals  who  invest  greater  effort,  and  thus  achieve  higher  levels  of 
difficulty,  would  have  greater  difficulty  demonstrating  higher  levels  of 
time-sharing  capacity.    Conversely,  individuals  with  low  effort  in  the 
test  difficulty  phase  would  more  easily  exhibit  greater  capacity  in  time- 
sharing.    In  addition.  Table  1  shows  a  significant  decrease  of  correla- 
tion between  single-task  and  time-sharing  RMS  error  between  the  first 
and  second  days  of  testing.     Since  the  high  test-retest  correlation 
(X  =  .90)  between  test  difficulty  across  the  two  days  of  testing  shows 
that  the  subjects  were  consistent  in  the  amount  of  effort  invested  in 
the  measurement  of  test  difficulty,  there  were  differential  changes 
among  individual  RMS  error  performances  as  a  function  of  changes  in 
single-task  performance.    This  is  supported  by  the  low  reliability  in 
single  task  RMS  performance  (rs  =  .01  &  .34)  and  the  moderate  test- 
retest  reliabilities  of  RMS  dual-task  performance  (rs  =  ^49  &  .69). 

Therefore,  the  available  evidence  suggests  that  procedures  for 
measuring  task  difficulty  allow  for  two  major  sovtrces  of  confounding: 
(a)  failure  to  train  to  asymptote  before  measuring  single-task  achieve- 
ment, and  (b)  using  current  performance  error  as  a  criterion  for  adaptive 
adjustments  of  test  difficulty.     The  first  source  of  confounding  could 
apparently  be  removed  by  training  to  asymptote  or  by  developing  a  statis- 
tical model  which  accurately  predicts  asymptotic  level  of  achievement 
from  selected  observations  of  learning  performance.    To  remove  the  second 
source  of  confounding  it  was  necessary  to  explain  how  differences  in 
individual  goals,  effort,  motivation  and  the  like  might  interact  with 


1222 


Table  1 


SELECTED  INTERCORRELATIONS  AND  TEST-RETEST 
CORRELATIONS  AMONG  MEASURES  OF  TRACKING 
TASK  DIFFICULTY,  SINGLE-  AND  DUAL-  TASK 
RMS  TRACKING  ERROR^ 


Day  1 

Day  2 

Test/ 

RMS  Dual-Task 

RMS  Dual-Task 

Retest 

Session  A 

Task  Difficulty 

-.53^ 

-.43^ 

RMS  Single-Task 

.13'^ 

.01 

RMS  Dual-Task 

.49 

Session  B 

RMS  Single-Task 

.10<^ 

.34 

RMS  Dual-Task 

.69 

^North  et  al.,  (1978),  p.  16 

^Probability  is  less  than  .05  that  the  absolute  value  of  any  correlation 
greater  than  .388  is  greater  than  zero;  t^(.388)  =  2,064,  df  =  24. 

^Probability  is  less  than  .05  that  the  differences  between  each  pair  of 
Day  1  minus  Day  2  values  is  greater  than  iiero;  ^(.52)  -  ^(.13)  «  2.I/4 
(Fisher's  r^  to  Z  transform). 
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sincle-task  difficulty  to  obscure  level  of  achievement  and  then  to  provide 
a  means  of  measuring  the  degree  of  the  interaction  in  an  individual  s 
tracking  test  data.    As  suggested  in  the  following  discussion,  an  adequate 
solution  to  the  degree  of  effort  problem  is  necessary  to  improve  the 
validity  of  forced  as  well  as  adaptive  difficulty  testing  paradigms. 

Tolerance  for  Error 

In  a  review  of  human  performance  limitations  in  visual  tracking 
tasks.  Poulton  (1969)  uses  "tolerance  for  error"  to  explain  how  individual 
effort  interacts  with  measures  of  tracking  task  ability.    When  first 
introduced  to  a  relatively  easy  task.  i.e..  one  with  a  single  dimension 
or  a  simple  control  system,  Poulton  says  that  initially  the  operator  will 
be  challenged  and  interested  in  the  task  giving  considerable  attention 
and  effort  to  task  performance.    Poulton  continues: 

But... [the  operator]  soon  discovers  what  he  can  and  cannot 
achieve,  and  settles  down  to  give  what  he  considers  to  be 
an  adequate  performance.    A  small  error  comes  to  be  tolerated, 
and  effort  is  directed  only  at  preventing  or  correcting  large 
errors  (Helson,  1949,  p.  495).    The  task  becomes  analogous  to 
a  vigilance  task,  and  fails  to  occupy  the  man's  full  channel 
capacity  or  attention. 

At  this  stage  the  level  of  performance  can  be  improved  by 

presenting  the  man  with  a  challenge  knowledge  of  results 

can  reduce  the  size  of  the  error  which  the  man  will  tolerate, 
and  so  raise  the  standard  of  his  performance. 

Unfortunately,  a  change  in  experimental  conditions  that  makes 
the  task  harder  may  also  present  a  challenge  to  the  man.  This 
means  that  the  poorer  performance  which  is  to  be  expected  as 
a  result  of  increased  difficulty  of  the  task  may  be  partly 
offset  by  the  challenge  effect.     Tracking  in  one  dimension  is 
thus  not  as  sensitive  to  changes  in  experimental  conditions 
as  are  tasks  which  occupy  the  man's  channel  capacity  more 
fully...   (1969.  pp.  312-313) 

Poulton' s  analysis  indicates  that  the  operator  may  decide  to  limit 
control  effort  to  the  prevention  or  correction  of  large  errors.    In  his 
view,  this  decision  converts  the  task  from  pure  tracking  to  vigilance 
performance  conditions.     Success  in  vigilance  performance  is  determined 
by  error  detection,  the  degree  of  error  to  be  tolerated,  and  skill  in 
error  correction.    Error  detection  will  reflect  differences  in  operator 
vigilance  strategy.    To  prevent  large  errors,  the  operator  maintains  a 
hieher  level  of  attention  or  effort  to  anticipate  and  respond  to  perfor- 
mance conditions  which,  if  uncorrected,  would  result  in  unacceptably 
large  errors.    On  the  other  hand,  when  the  operator  strategy  is  to 
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correct  large  errors^  the  operator  responds  only  If  he  has  detected  the 
occurrence  of  deviations  which  have  exceeded  his  acceptable  tolerance 
limit. 

An  operator  shift  from  pure  tracking  to  one  of  the  vigilance  perfor- 
mance strategies  would  explain  how  the  adaptive  logic  in  the  Gopher  et  al. 
(1974)  testing  paradigm  allowed  subjects  to  exhibit  differential  improve- 
ments over  baseline  predictions  in  dual-task  performance.     The  adaptive 
logic  in  the  Gopher  et  al.  paradigm  was  expressed  as  a  function  of  target 
error  measured  as  deviation  from  center  of  the  visual  display.    When  error 
was  consistently  less  than  10%  of  display  length,  task  difficulty  was 
progressively  increased.     If  error  consistently  exceeded  the  10%  limit, 
task  difficulty  was  reduced.     Task  difficulty  stablized  when  the  errors 
were  distributed  about  equally  above  and  below  the  limiting  value.  Given 
stable  or  increasing  levels  of  skill,  an  operator  decision  to  tolerate 
greater  error  would  cause  an  increase  in  the  observed  deviations  which 
would,  in  turn,  cause  a  decrease  in  the  existing  estimate  of  task  diffi- 
culty.    The  amount  of  decrease  would  be  a  direct  function  of  the  increase 
in  error  tolerance.     In  subsequent  performances  the  operator  would  be  able 
to  achieve  correspondingly  less  average  error  than  predicted  for  higher 
levels  of  difficulty  because  the  observed  estimate  of  task  difficulty  under- 
estimated the  true  level  of  skill. 

Although  the  tolerance  for  error  process  invalidates  existing 
procedures  to  estimate  task  difficulty  with  an  adaptive  logic  approach, 
it  must  also  be  accounted  for  in  a  forced  difficulty  paradigm,  e.g.. 
Pew  et  al.   (1977).     Poulton*s  analysis  implies  that  a  decision  to  limit 
control  effort  represents  the  end  of  a  learning  phase  in  skill  acquisition. 
However,  the  operator  might  become  bored,  fatigued,  or  otherwise  disinclined 
to  maintain  effort  to  learn  or  perform  before  completely  mastering  the  task. 
Estimates  of  task  difficulty  before  a  decision  to  switch  from  tracking  to 
vigilance  performance  would  thus  underestimate  the  true  asymptotic  level 
of  achievement.     As  an  aside,  there  would  be  some  training  management  value 
in  knowing  the  extent  of  any  skill  improvement  which  might  occur  as  a 
function  of  practice  after  the  switch  to  the  vigilance  mode  of  performance. 

The  concept  of  tolerance  for  error  and  the  corresponding  switch  from 
tracking  to  vigilance  performance  strategies  has  definite  measurable 
implications.     Suppose  performance  is  represented  as  a  sequence  of  obser- 
vations of  a  measure  of  skill  from  repeated  trials  across  some  extended 
period  of  time.     If  greater  effort  in  the  learning  phase  corresponds  to 
improvement  of  skill  level  and  a  constant  or  perhaps  decreasing  level 
of  performance  variability,  data  from  the  repeated  observations  should 
exhibit  a  definite  trend  of  improvement  of  level  of  skill.     An  increased 
level  of  error  after  the  shift  to  the  vigilance  phase  should  be  observed 
as  a  discontinuity  of  either  mean  or  variability  of  performance.     In  the 
vigilance  phase,  the  observations  should  represent  random  samples  from  a 
distribution  with  mean  and  variance  determined  by  the  degree  of  error 
tolerance  and  the  particular  vigilance  performance  strategy.  Statistical 


methods  fcr  estimating  parameters  from  repeated  observations  will  be  con- 
sidered after  a  brief  summarization  of  the  Implication  that  an  operator 
may  attempt  to  minimize  effort  rather  than  maximize  performance. 

To  summarize- the  Implications  of  Poulton's  concept  of  tolerance  for 
error.  It  was  hypothesized  that  (a)  differences  In  operator  goals,  attitudes 
and  the  like  would  be  represented  In  different  performance  strategies,  (b) 
these  strategies  could  be  operationally  defined  on  a  scale  of  performance 
effort,  and  (c)  different  strategies  and  tolerances  for  error  would  lead 
to  measurable  differences  In  patterns  of  performance  associated  with  the 
corresponding  level  of  effort.    The  two  extremes  of  the  scale  of  effort 
would  be  performance  maximization  at  the  high  effort  end  and  effort 
minimization  at  the  low  end.    Figure  1  depicts  a  schematic  layout  of  the 
scale  of  effort  concept  and  the  ordering  of  performance  strategies  which 
were  logically  differentiated  In  the  preceedlng  analysis  of  the  tolerance 
for  error  concept. 

The  Method  of  Statistical  Analysis 

Standard  statistical  methods  from  the  erea  of  time-series  data 
analysis  provided  the  analytic  tools  needed  to  evaluate  both  trend  and 
variability  components  in  a  sequence  of  tracking  performance  observations. 
Since  these  methods  are  commonly  used  in  engineering  and  economic  analyses, 
some  of  them  may  not  be  familiar  to  the  psychologist.    An  understanding  of 
mean  square  successive  differences  (KSSD)  is  crucial  to  the  interpretation 
of  the  results  of  this  Investigation.    Therefore,  MSSD  is  described  in 
limited  detail  here.    Readers  interested  in  greater  detail  should  refer  to 
the  technical  sources  and  those  already  familiar  with  MSSD  may  skip  to  the  I 
next  section  without  any  loss  of  continuity. 

Mean  square  successive  differences  is  a  measure  of  variability  of 
performance  based  on  the  order  of  the  observations  as  the  origin.    As  a 
measure  of  trend  strength  in  a  set  of  time-series  data,  e.g.,  repeated 
measures,  MSSD  derives  its  meaning  from  the  fact  that  pairs  of  adjacent 
observations  will  be  more  highly  correlated  than  will  be  pairs  of  more 
widely  separated  values.    This  sequential  dependency  of  the  observations 
on  their  order  means  that  with  a  trend  present  in  the  data,  differences 
between  pairs  of  adjacent  observations  will  be  smaller  than  when  the  data 
is  from  a  random  sample.    The  variance  is  the  average  variability  of  the 
observations  with  the  mean  as  the  origin.    Therefore,  a  comparison  of  the 
variance  with  MSSD  will  be  an  index  of  trend  strength.    When  there  is  a 
linear  or  polynomial  trend  in  the  data,  the  MSSD  will  be  small  relative 
to  the  variance  as  illustrated  in  Figure  2.    Without  a  stable  trend, 
MSSD  will  approach  the  variance  as  a  measure  of  variability.  (See 
Brownlee,  1965,  pp.  221-223  for  a  proof  and  more  detail  on  computational 
methods*} 

Standard  methods  are  used  to  transform  the  ratio  of  MSSD  to  the 
variance  into  a  standard  normal  deviate,  i.e.,  a  £-score  (Brownlee,  1965). 
As  a  standard  normal  deviate  this  transformed  ratio  can  be  employed  to 
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Igure  1.    Schematic  Depiction  of  Tolerance  for  Error  as  a  Function  of  Degree  of  Effort 
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Figure  2:     Depiction  of  relationship  between  successive  differences 
and  deviations  from  the  mean  in  a  set  of  time  series  data 
with  a  polynomial  trend. 
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determine  the  departure  of  the  data  from  randomness  In  the  conventional 
statistical  way.    That  Is,  the  Investigator  posits  an  alpha  probability 
and  accepts  or  rejects  the  null  hypothesis  of  no  trend  as  the  obtained 
z^score  Indicates.    Brownlee  reports  that  other  Investigators  have  shown 
that  the  2;-8core  transform  Is  acceptable  with  as  few  as  ten  observations 
and  tables  exist  for  use  with  as  few  as  four  observations.  Unfortunately, 
these  tables  are  not  generally  available  and  the  occasional  user  may  find 
It  difficult  to  obtain  copies  (see  Hart,  1942,  for  tables). 

Research  Hypothesis 

The  preceding  analyses  suggested  that  (a)  the  concept  of  tolerance 
for  error  would  associate  changes  In  performance  effort  and  differences  In 
such  attltudlnal  variables  as  operator  goals ,  motivation  or  Interest  In  the 
task  with  differences  In  patterns  of  performance,  particularly  variability 
of  performance,  over  time;  and,  (b)  the  MSSD  measure  would  discriminate  the 
presence  or  absence  of  trends  In  time  series  data.     Suppose  that  two  groups 
of  subjects  were  selected  on  the  basis  of  presumed  differences  In  attitude, 
that  if  present,  these  attltudlnal  differences  would  result  In  differences 
In  performance  effort,  and  that  members  of  these  groups  were  given  a  series 
of  trials  with  the  Pew  et  al.   (1977)  visual  tracking  test  In  critical 
tracking  mode.     Finally,  If  the  MSSD  measure  was  then  used  to  categorize 
performance  by  the  members  of  each  group  into  subgroups  of  random  or  non- 
random,  analysis  of  trends  or  variability  in  the  data  for  the  resulting 
two  by  two  contingency  table  should  reveal  an  interaction  of  attltudlnal 
group  with  type  of  performance  across  blocks  of  performance  trials.  The 
trials  would  be  blocked  to  provide  means  and  standard  deviations  to  estimate 
the  "local"  level  of  achievement  and  variability  of  performance.  The 
following  data  collection  and  analysis  methods  were  employed  to  test  this 
hypothesis  of  a  triple  interaction. 


METHODS 

Subjects 

Data  for  this  Investigation  were  obtained  from  the  records  of  29 
individuals  who  had  participated  in  a  comprehensive  selection  testing 
research  program.     Nine  of  the  individuals  had  recently  resigned  or  been 
eliminated  from  warrant  officer  or  helicopter  pilot  training  and  20  of 
their  contemporaries  were  still  in  the  Army  warrant  officer  helicopter 
pilot  training  program  at  th  US  Army  Aviation  Center,  Fort  Rucker,  AL. 

Test  Apparatus 

A  m4.>del  620  Visual  Tracking  Analyzer  manufactured  by  Bolt,  Beranek 
and  Newman,  Cambridge,  MA,  was  used  to  administer  the  visual  tracking  test. 
The  model  620  is  capable  of  testing  in  either  fixed  or  critical  tracking 
mode  but  this  investigation  was  limited  to  critical  tracking  data.  The 
light  spot  is  displayed  on  a  horizontal  unit  20  by  7.5  by  10  cm  which  contains 
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a  horizontal  line  of  64  light  emitting  diodes,  each  spaced  2,54  mm  apart. 
The  display  unit  is  connected  to  a  master  control  unit  by  a  15  foot  wire 
cable  with  connectors  at  each  end.    The  master  control  unit  provides  basic 
electronic  circuitry,  power  supply,  and  the  tester^s  unit.    The  tester's 
unit  provides  controls  to  (a)  select  the  mode  of  tracking  operation,  (b) 
set  the  number  of  trials  per  testing  block,  (c)  start  a  block  of  test 
trials,  (d)  enable  the  start  of  each  test  trial,  (e)  reject  any  unsuitable 
trial  performance,  and  (f)  conduct  a  standard  system  checkout  to  verify 
each  of  the  system  functions  and  displays  and  provide  demonstrations  of 
key  features  to  each  subject.    Displays  on  the  tester^s  unit  provide  status 
information  about  the  state  of  the  system,  number  of  the  current  trial  in 
a  block,  and  the  score  for  both  the  most  recently  completed  trial  and  the 
current  block  average. 

The  subject  controls  the  location  of  the  light  spot  with  lateral 
movements  of  a  spring-loaded,  finger  operated  joy-stick.    One  degree  of 
stick  deflection  corresponds  to  a  movement  of  2,36  mm  on  the  visual  display. 
The  control  stick  Is  mounted  on  a  metal  box  11.2  by  17.5  by  5  cm  and  it  is 
connected  to  the  visual  display  unit  by  a  6  foot  wire  cable  with  connectors 
at  each  end.    The  subject *s  control  unit  also  contains  a  calibration  thumb 
wheel  and  two  trial  start  buttons,  one  button  on  either  side  of  the  control 
stick. 

To  measure  the  effective  time  delay,  the  test  apparatus  is  operated  in 
the  critical  tracking  mode.    The  value  of  the  system  time  constant  at  the 
end  of  a  trial  is  the  index  of  the  subject *s  effective  time  delay  for  that 
trial.    At  the  start  of  a  trial  the  system  aut^jiatically  set  the  time 
constant  at  500  milliseconds  (ms).    As  the  trial  progresses  the  time  constant 
is  reduced  at  the  rate  of  10  ms  per  second  until  the  light  spot  has  deviated 
2.5  cm  from  the  center  of  the  display  and  at  the  rate  of  2.5  ms  per  second 
after  the  light  spot  has  exceeded  the  2.5  cm  limit.    As  the  size  of  the  time 
constant  decreases,  the  rate  of  movement  on  the  display  increases  until  the 
subject  is  unable  to  maintain  the  light  spot  location  within  the  limits  of 
the  display.    When  the  light  spot  location  exceeds  the  limits  of  the  display, 
the  system  stops  the  trial,  displays  the  trial  score  and  the  current  value  of 
the  block  mean  effective  time  delay  on  the  tester's  display,  and  signals 
an  end  of  trial  on  the  tester's  status  display.    The  tester  must  then 
record  the  trial  score  if  it  is  desired  and  enable  a  new  trial.    The  system 
is  designed  so  that  an  attempt  to  enable  a  new  trial  at  the  end  of  a  block 
will  result  in  an  end  of  block  signal  on  the  tester's  status  display. 

Procedure 

Subjects  reported  to  a  standard  testing  location  according  to  a 
prescribed  week  long  testing  schedule.    This  testing  schedule  was  worked 
out  to  provide  continuity  of  testing  over  a  five  day  period  and  to  minimize 
the  test  activity  interference  with  routine  training.    The  second  day  of 
testing  was  used  to  give  40  trials  of  the  critical  tracking  test  in  4  blocks 
of  10  trials.    The  tester  set  the  system  to  the  system  checkout /demonstration 
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mode.    When  the  subject  reported  for  testing,  he/she  was  seated  at  a  table 
with  the  finger  operated  control  stick.    The  tester  then  read  through  the 
following  Instructions: 

In  this  test  your  job  Is  to  control  the  movements  of. this  light 
spot  (tester  points  to  light  spot  on  visual  display]  with  the 
control  stick  In  front  of  you.    Take  hold  of  the  stick  In  a 
comfortable  position  and  move  It  right  and  left.    Notice  that 
the  control  moves  the  light  spot  back  and  forth  on  the  display. 
Later,  when  you  start  the  test,  the  light  spot  will  move 
randomly  right  or  left  on  the  display  from  time  to  time.    As  a 
test  progresses,  the  time  between  these  random  movements  gets 
shorter  and  shorter  and  It  gets  harder  and  harder  to  control 
the  position  of  the  light  spot.    Finally,  the  light  spot  goes 
out  of  control,  off  the  end  of  the  display,  and  the  system  will 
freeze  the  light  spot  at  the  end  of  the  display.    Your  score 
will  be  the  time  between  the  random  movements  when  the  light 
spot  Is  frozen. 

(Tester  note:    Set  the  system  In  CRITICAL  MODE.) 

Notice  that  the  light  spot  Is  now  frozen  at  the  end  of  the 
display.    Move  the  control  stick  and  notice  that  the  light 
spot  does  not  move.    When  this  happens  that  means  the  end 
of  the  test  and  I  will  read  your  score  to  you.    To  start  a 
test  you  will  find  two  buttons  next  to  the  control  stick 
marked  "START".    After  I  say  "Ready"  you  may  push  either 
button  to  start  the  test.    When  you  release  the  button,  the 
light  spot  will  automatically  move  to  the  center  of  the 
display  and  the  test  will  start.     (Tester  demonstrates.) 
Do  you  have  any  questions? 

You  will  repeat  the  test  40  times  in  the  next  hour.  After 
each  trial  I  will  read  your  score  to  you.  The  smaller  your 
scoi'e  the  better  your  performance.  Your  objective  should 
be  to  get  the  smallest  possible  score  in  the  fewest  trials. 
To  get  a  small  score  It  is  very  important  to  keep  the  light 
spot  as  near  the  center  of  the  display  as  possible.  Do  you 
have  any  questions  on  scoring? 

Each  trial  was  followed  by  15  seconds  rest  and  there  was  a  2 
minute  rest  period  after  each  block  of  10  trials.    At  the  end  of  each 
trial,  the  tester  recorded  the  trial  score,  reported  It  orally  to  the 
subject,  timed  the  rest  Interval,  enabled  the  system  for  the  next 
trial  or  block,  and  at  the  end  of  the  rest  time,  announced  "Ready"  to 
signal  the  subject  to  ^tart  the  next  trial. 

The  subject  participated  In  fixed  difficulty  tracking  on  the  third 
and  fourth  days  of  testing  before  receiving  a  final  test  In  critical 
difficulty  tracking.    On  the  third  day  the  subject  performed  fixed 
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difficulty  tracking  to  establish  levels  of  skill  for  the  time-sharing 
test  given  on  the  fourth  day.    The  time-sharing  tests,  lasting  about  30 
minutes,  consisted  of  45  trials  of  fixed  difficulty  tracking  in  3  blocks 
of  15  trials,  1  block  for  each  of  3  levels  of  tracking  test  difficulty. 
Following  the  time-sharing  tests  each  subject  received  5  trials  in 
critical  tracking  mode  as  a  final  test  of  tracking  skill. 

Data 

Ihe  tester  recorded  the  effective  time  delay  score  for  each  of  the 
40  initial  and  the  5  final  trials  of  critical  tracking.    Recorded  on  a 
standard  form  specifically  designed  for  use  with  critical  tracking  in 
the  aviator  selection  research  program,  the  critical  tracking  scores 
were  later  transcribed  to  standard  80  column  computer  card  image  forms, 
checked  by  a  second  person,  and  keypunched  with  verification.    A  special 
FORTRAN  program  was  prepared  to  compute  means  and  standard  deviations 
for  the  9  blocks  of  5  trials  and  to  compute  the  zj-score  conversion  of 
the  MSSD  measure  from  all  the  data  in  the  first  40  trials. 

Design 

A  two-way  categorization  was  used  as  the  design  of  the  subsequent 
analyses.     The  two  categories  were  type  of  subject,  trainee  versus  attritee, 
and  type  of  performance,  random  (z-score  less  than  1.96)  and  nonrandom 
(z-score  greater  than  or  equal  to  1.96) J  nonrandom  in  this  case  means 
that  the  data  contained  a  linear  or  higher  order  polynomial  trend. 

Data  Analysis 

The  first  step  in  the  data  analysis  was  to  compute  a  chi-square  to  test 
the  hypothesis  that  frequency  of  classification  of  type  of  performance 
was  not  dependent  on  student  category.    Acceptance  of  this  null  hypothesis 
of  no  dependency  would  be  used  as -evidence  for  employing  a  least  squares 
analysis  of  variance  procedure  with  the  observed  cell  frequencies  as  the 
best  estimates  of  the  proportions  in  the  population.    Rejection  of  the  null 
hypothesis  of  frequency  of  classification  would  indicate  a  need  to  employ 
methods  to  adjust  the  degrees  of  freedom  in  the  analysis  of  variance 
procedures. 

A  2  between-,  1  within-sub jects  repeated  measures  analysis  of  variance 
was  used  to  test  hypotheses  about  the  equality  of  (a)  mean  effective 
time  delay  and  (b)  the  standard  deviation  of  effective  time  delay  for  the 
five  triail  blocks.    Any  effect  in  the  chi-square  test  or  the  analyses  of 
variance  was  considered  statistically  significant  at  the  conventional  .05 
level. 


RESULTS 

The  z;-score  transform  from  each  subject's  data  was  used  to  classify 
his/her  performance  as  random  or  nonrandom.    If  the  z-score  was  less 
than  1.96  the  performance  was  classified  as  random.    Any  performance 
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with  a  z;-score  greater  than  or  equal  to  1.96  was  considered  nonrandom, 
I.e. I  the  data  contained  a  trend*    As  a  one-tailed  test,  this  rule 
would  result  in  a  Type  I  classification  error  about  2,5%  of  the  time. 
Table  2  gives  the  breakdown  of  number  of  subjects  in  each  cell  of  the  two 
by  two  student  category  by  performance  type  matrix. 

Table  2 

Breakdown  of  Number  of  Subjects 


Student 
Category 

Type  of 

Performance 

Total 

Random 

Nonrandom 

Trainee 

11 

9 

20 

Attritee 

_2 

_7 

_9 

Total 

13 

16 

29 

A  chi-square  analysis  was  used  to  determine  if  the  classification 
of  random  versus  nonrandom  performance  was  dependent  on  student  category. 
The  marginal  totals  were  used  to  define  the  expected  cell  values  because 
there  was  no  prior  reason  to  expect  a  particular  breakdown  pattern.  The 
results  of  the  chi-square  analysis  revealed  no  statistically  significant 
dependency  in  the  observed  breakdown  of  number  of  subjects  (x^  =  1.53, 
£^  >,10,  1  df ),    This  result  was  interpreted  as  evidence  for  using  a 
least  squares  analysis  of  variance  for  unequal  cell  frequencies  with 
mean  effective  time  delay  (Winer,  1971). 

Mean  Effective  Time  Delay 

The  measure  of  skill  in  the  critical  tracking  test  was  effective 
time  delay.    Means  for  each  subject  for  nine  blocks  of  five  trials  in 
the  40  practice  and  5  final  test  trials  were  analyzed  with  analysis  of 
variance  (Table  3),    The  hypothesis  of  an  interaction  between  student 
category  and  type  of  performance  across  the  nine  blocks  of  five  trials 
was  not  confirmed  by  the  mean  effective  time  delay  measure.     The  analysis 
of  variance  revealed  statistically  significant  main  effect  for  blocks  of 
trials  which  indicated  that  average  performance  had  improved  with 
practice  (Figure  3). 

There  were  two  statistically  significant  interactions  for  the  mean 
effective  time  delay.     Student  category  interacted  with  blocks  of  trials 
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Table  3 

Analysis  of  Variance  Summary  for 
Block  Means  of  Effective  Time  Delay 


Source 

df 

Mean  Square 

F-Ratio 

9 

U 

Total 

260 

2696.22 

Between  Subiects 

28 

126A8.A5 

Student  Category 

(A) 

1 

502A.53 

.43 

- 

Performance  Type 

(B) 

1 

35276.90 

2.96 

.004 

A  X  B 

1 

16242.16 

1.36 

.001 

Error 

25 

11504.52 

Within  Subiects 

232 

1495.09 

Blocks  (C) 

8 

24921.2. 

43.38*** 

.470 

A  X  C 

8 

1513.15 

2.63* 

.023 

B  X  C 

8 

2314.58 

4.03** 

.043 

A  X  B  X  C 

8 

245.53 

.43 

Error 

200 

574.51 

*£  <  .  025 
**£<  .01 
***£<  .001 
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Figure  3:     Improvement  of  mean  effective  time  delay  as 
function  of  blocks  of  trials. 
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(P-  2.63,  £<  .025,  df  -  8,  200,  proportion  of  variance  -  .023).  As 
shown  in  Figure  4,  the  source  of  this  interaction  effect    was  the  larger 
effective  time  delay  means  for  the  attritees  on  the  first  three  blocks  of 
trials.    The  other  statistically  significant  interaction  was  type  of  per- 
formance with  blocks  of  trials  (F  -  4.03,  £  <  .001,  df  -  8,200,  proportion 
!f  "'^Jrf*''^  "  ^^2ure  5  shows  that  the  source~f  this  effect  was 

the  difference  in  slopes  between  the  two  types  of  performance  which 
indicates  the  greater  rate  of  learning  or  degree  of  effort  for  the  non- 
random  group. 

Variability  of  Performance 

Analysis  of  the  block  standard  deviations  supported  the  hypothesis 
that  a  measure  of  variability  of  performance  would  be  more  sensitive  to 
differences  of  degree  of  effort  than  a  measure  of  central  tendency. 
Analysis  of  variance  with  the  block  standard  deviations  confirmed  the 
hypothesis  that  student  category  would  interact  with  type  of  performance 

?Smp         ^ir*"^  °1  '^^^^^  ^"'^  revealed  other  significant  differences 

^.iaoxe  Figure  6  shows  mean  standard  deviation  as  a  fune«"'on  of 

student  category  and  type  of  performance  across  blocks~ortrlals  .*"one 

.  i??/^^*'^''^  °^  ^^^^^  extreme  differences  in  block  to  block 

variability  of  the  two  attritee  groups  in  relation  to  the  variability  of 

/^fj?*^®^*    ^®  random  trainee  group  exhibits  the  least  block  to  block 
variability  and  the  nonrandom  trainee  group  gives  strong  evidence  of  improve- 
ment of  variability  with  practice.    Finally,  the  equivalence  of  the  mean 
standard  aviation  on  the  final  test  for  each  of  the  four  groups  strongly 
suggests  that  factors  other  than  differences  in  level  of  tracking  skill 
are  influencing  the  performances  of  the  members  of  the  different  groups. 

Some  caution  must  be  usnd  in  interpreting  the  variability  of  the 
random  attritee  group  because  the  group  has  only  two  subjects.  However 
these  two  subjects  also  have  the  greatest  total  variances  of  any  of  the 
subjects  to, ^  (Table  5).    As  would  be  expected  from  an 

insection  of  the  group  plots  in  Figure  6,  the  size  of  the  total  vatiances 
in  Table  5  is  correlated  with  group  membership.    This  correlation  is 
supported  by  the  significance  of  the  between  subjects  effects  in  the 
analysis  of  variance  summary  (Table  4).    Table  6  gives  the  mean  standard 
deviations  for  each  of  the  main  effects  and  the  interaction  in  the  two  by 
two  student  category  by  type  of  performance  part  of  the  design.  Finally, 
Figure  7  shows  the  interaction  of  type  of  performance  across  blocks  of 
trials  on  mean  block  standard  deviation.    The  interesting  feature  of  this 
interaction  is  the  increasing  variability  trend  of  the  random  versus  the 
decreasing  variability  trend  of  the  nonrandom  groups.    Ihis  difference  of 
trend  of  variability  as  a  function  of  type  of  performance  is  strong 
support  for  the  hypothesis  that  MSSD  is  an  indicator  of  differences  in 
performance  patterns. 
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Figure  4 :     Interaction  of  Student  Category  on  Mean  Effective 
Time  Delay  Across  Blocks  of  Five  Trials 
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Figure  5:     Interaction  of  type  of  performance  across 

blocks  of  trials  on  mean  effective  time  delay. 
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Table  4 


Analysis  of  Variance  Summary  for 
Block  Standard  Deviations  of  Effective  Time  Delay 


Source 

df 

Mean  Square 

F-Ratio 

"2 

Total 

260 

308.54 

Between  Subjects 

28 

420.86 

Student  Category 

(A) 

1 

1734.88 

6.53** 

.018 

Performance  Type 

(B) 

1 

1193.61 

4.49* 

.011 

A  X  B 

1 

2213.94 

8.33*** 

.024 

Error 

25 

265.67 

Within  Subjects 

232 

294.67 

Blocks  (C) 

8 

385.25 

1.48 

.012 

A  X  C 

8 

201.35 

.77 

B  X  C 

8 

524.14 

2.01* 

.026 

A  X  B  X  C 

8 

935.17 

3.59*** 

.067 

Error 

200 

260.34 

*£  <.05 
**£  <.025 
***£  <.001 
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Figure  6:     Interaction  of  student  category  with  type  of  performance 
across  blocks  of  trials  on  mean  block  standard  deviation, 
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Table  5 

Sum  of  Block  Variances  for  each  Subject 


Type  of  Performance 


Student  Category 


Random 

Nonrandom 

6509.8 

4890.5 

6850.4 

5519.7 

7570.0 

6520.2 

10040.3 

8010.5 

10903.3 

8649.5 

11549.0 

9439.6 

12211.4 

12700.4 

14599.8 

15989.9 

17152.5 

18569.7 

18239.4 

19180.8 

21580.9 

8309.4 

29657.8 

9499.5 

Attritee 

11220.6 

13429.4 

14869.6 

16350.1 

18228.8 

lsoa 
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Figure  7:     Interaction  of  type  of  performance  across 

blocks  of  trials  on  mean  block  standard  deviation. 
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Table  6 

Mean  Block  Standard  Deviations  for  Student 
Category  by  Type  of  Performance 


Type  of  Performance 


Rttnuum 

Nonrandom 

Conibined 

Trainee 

33.39 

29.16 

31.49 

Attrltee 

47.68 

34.03 

37.06 

Combined 

35.59 

31.29 

33.22 

DISCUSSION 

The  main  hypothesis  of  this  Investigation  was  that  degree  of  effort 
would  be  a  source  of  confounding  In  tracking  test  performance.  The 
results  confirmed  this  hypothesis  If  degree  of  effort  varies  with  motiva- 
tion to  perform  and  differences  in  motivation  depend  on  student  category. 
The  major  source  of  this  confounding  was  differences  in  variability  of  per- 
formance as  a  function  of  number  of  test  trials.    The  source  of  the  inter- 
action is  most  clearly  apparent  in  the  comparison  of  type  of  performance 
with  student  category  across  the  blocks  of  trials  on  mean  block  standard 
deviation  (Figure  6).    Inspection  of  mean  effective  time  delay  interactions 
shows  that  mean  effective  time  delay  is  correlated  with  variability  of 
performance  which  is  consistent  with  Poulton's  hypothesis  of  a  shift  in 
performance  strategy. 

A  second  hypothesis  was  that  inadequate  practice  was  a  source  of  con- 
founding in  the  measurement  of  level  of  achievement  in  previous  research. 
In  this  investigation 9  level  of  achievement  is  represented  by  mean  effec- 
tive time  delay  and  Figure  3  clearly  shows  a  large  improvement  of  this 
measure,  even  after  the  eighth  trial  block.    A  comparison  of  mean  effec- 
tive time  delay  from  this  investigation  and  a  previous  study  by  Pew  et 
al.  (1977)  with  the  same  tracking  test  further  supports  the  hypothesis  of 
ini'dequate  practice.    In  the  Pew  et  al.  study  92  students  in  Air  Force 
Undergraduate  Pilot  Training  at  Williams  Air  Force  Base,  AZ,  performed 
10  trials  of  the  critical  tracking  test.    Table  7  is  a  comparison  of  the 
mean  and  standard  deviation  of  effective  time  delay  for  the  last  7  trials 
of  the  Pev  et  al.  study  with  the  means  and  standard  deviations  of  5  trials 
blocks  and  the  final  test  for  trainees  in  the  present  study.  (Trainees 
were  used  for  comparability  of  populations.)    The  Important  comparisons 
in  Table  7  show  that  there  were  no  significant  differences  between  the 
Pew  et  ale  results  and  those  of  this  Investigation  on  the  first  4  blocks 
of  trials. 


Table  7 

Means  and  Standard  Deviations  of  Effective  Time  Delay  from  Pew  et  al.  (1977) 


and  Blocks  of  Trials  for  Trainees  from  the  Present  Investigation 


Pew 

Measure    et  al. 

Block^ 

Final 

1 

2  3 

4 

5 

6 

7  8 

Test 

Mean       340. 3 
S.D.  52.3 

344.0 
66.3 

327.6  318.5 
59.4  59.4 

308.5 
58.2 

300.0* 
55.3 

303.1* 
54.1 

288.7*  298.9* 
58.4  60.1 

250.9** 
40.0 

*n  -  92;  \  »  20; 

.05;  **£<  .001 

The  results  of  this  Investigation  Indicate  that  mean  square  successive 
differences  (MSSD)  should  be  a  useful  statlstlca"^  tool  In  subsequent  re- 
search.   Although  MSSD  was  eii5>loyed  In  the  prese*  i:  Investigation  as  a  one- 
tailed  test  to  indicate  polynomial  trends,  significant  negative  values  of 
the  £-score  derived  from  MSSD  would  Indicate  that  the  data  contained  system- 
atic cyclic  or  periodic  trends,  i.e.,  trends  describable  with  trigonometric 
functions.    This  latter  feature  makes  MSSD  especially  useful  in  the  anal- 
ysis of  tracking  performance  from  continuous  control  tasks  where  periodic 
features  of  the  data  may  indicate  important  differences  in  operator  control 
behaviors.    With  a  significant  positive  or  negative  £-i5core  from  the  MSSD 
measure,  the  data  analyst  is  justified  in  a  detailed  search  for  the  sources 
of  the  specific  polynomial  or  periodic  trends  in  an  individual  set  of  data. 

Research  is  needed  to  establish  the  predictive  validity  of  differences 
in  patterns  of  performance  from  the  tracking  test  for  overall  success  in 
pilot  training.     Interviews  with  instructor  pilots  have  indicated  that  lack 
of  motivation  is  frequently  a  source  of  inadequate  student  progress  in 
Army  helicopter  pilot  training.    This  instructor  pilot  observation  is  sup- 
ported by  two  sources  of  additional  evidence.    First,  some  50%  of  all  attri- 
tion in  the  Array  helicopter  pilot  training  program  results  from  resignations 
(Elliot  &  Joyce,  1978).    Furthermore,  motivation  was  identified  as  a  major 
factor  among  resigning  students.    Second,  an  unreported  exploratory  invest- 
igation at  the  US  Army  Aviation  Center  found  a  correlation  of  .78  between 
instructor  pilot  ratings  of  basic  student  pilot  qualities,  e.g.,  motivation, 
judgment  and  the  like,  on  daily  grade  sheets  from  early  primary  training  and 
subsequent  eliminations  from  advanced  training.    This  evidence  suggests  that 
the  present  approach  may  yield  a  substantial  reduction  in  the  residual  var- 
iance of  the  aviator  trainee  selection  testing  process. 
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The  approach  used  In  this  Investigation  also  presents  some  Interesting 
possibilities  for  further  research  In  aviator  trainee  selection  and  manage- 
ment methods.    For  example,  detailed  analyses  of  Individual  performance 
trends  were  not  accomplished  In  the  present  investigation.    However,  the 
logical  analysis  of  degree  of  effort  depicted  In  Figure  1  Indicates  that 
differences  In  such  trends  should  further  differentiate  among  types  of 
performance  and  the  associated  performance  strategies.     Ode  Interesting 
hypothesis  Is  that  learning  behavior.  I.e.,  performance  strategy.  In  a 
sliq)le  tracking  test  would  predict  learning  behavior  In  more  complicated 
tasks.  I.e.,  performance  In  aircraft  control. 

Cronbach  and  Snow  (1977)  evaluate  the  hypothesis  of  prediction  from 
learning  behavior  In  these  terms: 

If  Individual  differences  prove  to  be  stable  and  predictable,  one  can 
capitalize  on  findings  from  the  experiment  in  which  learning  is 
observed  only  for  a  short  time,  perhaps  on  just  one  task  or  topic. 
If  individual  differences  are  radically  altered  during  learning... 
the  short-term  experiments.  ..will  not  give  practically  useful  con- 
clusions.    Under  this  hypothesis,  persons  who  learn  most  efficiently, 
among  a  group  all  of  whom  have  become  familiar  with  the  problem, 
would  not  generally  be  the  ones  who  learned  most  efficiently  at  the 
outset;  hence,  they  would  not  have  been  among  the  most  successful 
learners  in  a  short  experiment  (p.  126). 

The  major  issue  is  whether  attitudinal  differences  such  as  motivation  which 
are  reflected  in  the  degree  of  effort  measurement  procedures  are  relatively 
stable  characteristics  of  an  individual's  learning  behavior.    As  a  test  of 
the  Cronbach  et  al.  hypothesis  in  a  subsequent  investigation,  the  nethods 
of  this  investigation  will  be  employed  to  predict  performances  of  these 
same  subjects  in  fixed  stability  tracking,  time-sharing,  and  a  job-sample 
test  administered  on  the  UH-1  flight  simulator. 
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Editlorial  Changes  to  Preprint  Draft  of  the  Revised  FAST 


Draft 

Item 

Comment 

MANUAL 

5 

Table  1 

Under  heading  "FAST  Booklet..."  the  phrase  "Stick  and 
Rudder"  should  be  replaced  with  the  word  "Cyclic" 

12 

Directions 

The  paragraph  beginning  "THE  SPECIAL  DIRECTIONS..." 

should  be  chmged  as  follows: 
Strike  all  the  wrods 


(1)  /In  the  third  line  beginning  at  "...A  CORRECTION  " 
and  ending  in  the  sixth  line  at  the  end  of  the 
sentence  ending  with  the  words  "...CORRECTION  FACTOR 
IS  APPLIED." 

(2)  Insert  the  foiling  in  place  of  the  words  delete  in 
(1)  above: 

Insert — 

Following"... OR  IFV/.THE  TEST  SCORE  WILL  BE  ADJUST- 
ED BY  SUBTRACTING  A  FRACTION  OF  THE  WRONG  ANSWERS 
FROM  THE  NUMBER  OF  RIGHT  ANSWERS.    EVEN  ON  THOSE 
TESTS  WHERE  THE  SCORE  IS  ADJUSTED  FOR  WRONG  ANS 
WERS.VYOU  SHOULD..." 

TEST  BOOKLET 

2     Instructions       Insert  following  the  last  parapgraph: 

DO  NOT  TURN  THIS ^  PAGE  UNTIL  YOU  ARE  TOLD  TO  DO  SO. 
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For  several  years  the  Air  Force  has  been  involved  in  the  investi- 
gation of  and  experimentation  with  simulators  for  teaching  hands-on 
maintenance  tasks  [Miller,  1976].     Investigations  have  shown  that  simu- 
lators provide  troubleshooting  instruction  which  is  at  least  equal  to 
that  afforded  by  actual  equipment  while  offering  additional  opportuni- 
ties such  as  Increased  individualized  Instruction  by  enabling  more 
practice  with  job  skills,  Increased  assistance  early  in  the  instruc- 
tional process  via  CAI  techniques,  increased  consistency  in  student 
evaluations,  and  decreased  equipment  costs  associated  with  breakage, 
obsolescence,  and  the  need  for  special  purpose  training  equipment. 
Furthermore,  the  application  of  computer-assisted  performance  training 
to  troubleshooting  instruction  provides  realistic  feedback  while  mani- 
pulating the  real  time  variables,  for  example,  an  induced  malfunction 
from  a  student  action  that  might  normally  occur  only  after  several 
hours  of  time  in  actual  equipment. 

Due  to  the  obvious  cost-effectiveness,  most  computer-controlled 
simulators  employing  CAI  and  CMI  have  been  constructed  to  replace 
sophisticated,  costly  equipment  requiring  high  level  skill  training. 
On  the  other  hand,  simulators  which  teach  fundamental  principles  and 
provide  training  of  basic  skills  are  generally  not  controlled  by  com- 
puter or,  if  controlled  by  computer,  exhibit  little  or  no  CAI  and  CMI 
instructional  techniques.     If  CAI  and  CMI  techniques  are  to  be  utilized 
for  such  applications,  it  is  necessary  from  a  cost  standpoint  that  the 
simulator  exhibit  general  purpose  properties  that  allow  interchangeable 
simulation  modules  on  a  mainframe  console  containing  the  computer  to 
provide  usage  in  a  wide  variety  of  job  skills  training. 

TRAINING  CARREL 

To  continue  the  investigation  of  the  utility  of  general  purpose 
simulation  in  formal  technical  training  environments,  the  Air  Force, 
through  coordinated  efforts  between  personnel  of  the  Human  Resources 
Laboratory  (at  Lowry  Air  Force  Base)  and  the  Denver  Research  Institute 
developed  a  computer-assisted  performance  training  carreK    The  carrel 
contains  interface  circuitry  to/from  a  PDP-11  computer  (a  minicomputer 
manufactured  by  Digital  Equipment  Corporation)  and  a  PLATO  IV  terminal 
(the  University  of  Illinois  plasma  screen  terminal)  designed  with  I/O 
bus  circuitry  for  control  of  peripherals  (various  devices  such  as 
switch  closures,  digital -*to-analog  converters,  waveform  generators, 
etc.). 
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ructlonal  carrel  has  a  random  access  slide  projector  to 
Junction  wfth  the  PLATO  screen  which  also  contains  a 
pabllity.    Additionally,  the  carrel  contains  provisions 
slnwlation  panels  which  offer  flexibility  in  implemen- 
of  simulations  via  several  panels  containing  different 
1  of  which  can  be  controlled  by  the  I/O  circuitry  for 
bes,  displays  and  switches,  as  well  as  voltage,  resis- 
ent  parameters  for  equipment  familiarization,  instruc- 
fiooting  and  maintenance^ 

I  was  designed  in  a  way  that  allows  experiments  involv- 
lon  of  concepts  and  the  retention  of  learning  to  be  en - 
)Mities  formed  by  a  variety  of  visual  and  auditory  cues 
The  communication  between  the  computer  devices  permits 
)assed  to  the  more  efficient  of  the  two  computers  for 
ippl  Icatlon. 

.  AS  A  TROUBLESHOOTING  SIMULATOR 

lent  was  designed  to  evaluate  both  the  performance  train- 
each  of  Its  component  parts  —  software  and  hardware, 
r  the  evaluation  was  to  prepare  a  short  course  demon- 
existing  Instructional  module  In  order  to  compare  car- 

wlth  traditional  Instruction.    The  Instructional 
d  was  a  "Troubleshooting  Fundamentals"  module  in  an  Air 
Ing  Command  Electronics  Principles  course.    As  such,  the 

implemented  for  use  in  this  module  resulted  in  the  car- 
a  troubleshooting  simulator. 

2d  module  al lowed  a  study  of  each  of  the  functions  of 
a  minimum  of  disruption  In  the  course  where  It  ap- 

rhnlcal  level  allowed  a  sophistication  of  simulated 

and  training  circuitry  that  required  the  utilization  of 
digital  controllers/sensors  in  conjunction  with  the  pro- 

ipllng  minicomputer  (PDP-11)  to  Interpret  data  to  the 

xnputer  (PLATO  IV  system). 

forty  students  was  selected,  twenty  of  whom  were 
lom  to  receive  traditional  lectures/demonstrations  of 
irse  module.    The  remaining  twenty  students  received 
via  the  troubleshooting  simulator.    Course  objectives 
were  paralleled  In  both  types  of  delivery  to  allow  the 
criterion  measurement  on  both  groups.    Thus,  the  em- 
d  on  the  evaluation  of  hardware  and  software  functions 
,  and  no  attempt  was  made  to  optimize  the  Instructional 
silvery  on  the  simulator. 

of  the  module  chosen  required  the  simulation  of  a 
and  various  DC  trainer  "schematic  boards"  which 
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could  easily  be  interchanged  on  the  simulator  panel •    The  schematic 
boards  exhibited  essentially  the  same  appearance  as  those  used  in  a 
traditional  delivery  of  the  course.    Thus,  the  use  of  the  boards  were 
familiar  to  students.    Figure  1  illustrates  the  interaction  between 
the  student  and  the  performance  carrel  during  the  delivery  of  the  in- 
structional module. 


Di  ATn  Tir 

I/O  BUS 

PDr-ii 

CIRCUITRY 

I 


1 

L 


FIGURE  I,  PERFORMANCE  CARREL  SYSTEM  DIA.  ILLUSTRATING  STUDENT  INTERACTION  DEVCES 


The  performance  training  was  presented  to  the  student  in  two 
modes  of  instruction,  namely,  the  presentation  of  theory  involving 
troubleshooting  fundamentals  via  programmed  instruction,  and  secondly, 
a  mode  whereby  the  student  was  given  a  troubleshooting  problem  and  had 
to  proceed  by  interacting  with  the  simulated  equipment  independent  of 
programmed  instruction.     In  the  first  mode,  the  student  had  to  be  res- 
ponsive to  computer  acfions.     In  the  second  mode,  the  computer  had  to 
be  responsive  to  student  actions. 


The  computer  monitored  two  classes  of  error  made  by  the  student 
during  delivery  of  the  module.    The  first  class  Included  misuse  of  the 
equipment,  while  the  second  class  dealt  with  incorrect  troubleshooting 
logic  on  the  part  of  the  student.    For  each  class  of  error,  the  PLATO 
IV  system  responded  with  remedial  instruction.     Several  schematic 
boards  representing  various  D.C.  trainer  circuits  of  different  levels 
of  difficulty  were  presented  to  the  student  in  a  manner  to  provide  the 
opportunity  for  successfully  completing  a  problem  on  one  level  of  diffi- 
culty before  advancing  to  a  more  difficult  level. 

Since  only  the  PLATO  IV  has  control  over  these  devices  in  the  car- 
rel which  communicate  directly  with  the  student,  the  software  was 
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developed  so  that  the  PLATO  IV  system  controlled  the  course  del f very 
In  a  way  that  the  fnteractlon  of  the  student  was  with  the  PLATO  IV 
terminal  only.    The  PLATO  IV  touch  panel,  slide  projector,  and  keyboard 
were  the  only  peripherals  used  during  this  mode  of  Instruction. 

For  the  second  mode  of  Instruction,  when  the  computer  had  to  be 
responsive  to  the  student,  the  function  of  the  PLATO  IV  system  was  to 
establish  which  schematic  board(s)  would  be  used  and  which  malfunc- 
tIcnCs}  were  to  be  Implemented.    Also,  the  PLATO  IV  presented  all  re- 
medial Instruction  required  during  this  phase  of  performance  training. 
In  this  case,  the  POP-1 1  was  used  for  the  simulation  of  devices  on  the 
carrel  control  panel  and  for  the  time  history  monitoring  of  student 
actions. 

The  software  requirements  established  for  the  PDP-11  allow  it  to 
fulfill  three  major  functions: 

a.  To  simulate  eqi:Ipment  (that  is,  the  multimeter  and  DC 
trainer  schematic  boards)  based  on  tables  which  define 
responses  to  student  actions  at  the  control  panel  of  the 
performance  carrel. 

b.  To  monitor  the  actions  of  the  student. 

c.  To  communicate  student  actions  or  status  changes  to  the 
training  computer  and  accept  Instructions  from  the 
training  computer. 

Figure  2  shows  a  typical  circuit  which  is  simulated  on  a  schema- 
tic board.    The  solid  line  circles  are  accessible  test  points  pro- 
vided In  this  schematic  board  for  probe  insertion.    The  dashed  line 
circles  are  additional  test  points  in  the  control  panel  which  are 
available  to  allow  flexibility  in  circuit  lay-out  of  other  schematic 
boards.    There  are  28  possible  test  points  for  use  In  lay-out.  The 
lamps  and  switch  are  shown  only  schematically  on  the  schematic  board. 
However,  actual  lamps  and  a  switch  are  mounted  directly  below  the 
schematic  board.    It  was  decided  to  mount  these  components  off  the 
actual  circuit  diagram  to  allow  maximum  versatility  in  the  location  of 
these  simulated  components  on  various  schematic  boards.  Schematic 
board  I.D.  switches  are  mounted  on  the  control  panel.    Varied  combina- 
tions of  holes  can  be  drilled  In  the  schematic  boards  which  determine 
which  of  the  I.D.  switches  are  activated.    This  allows  the  identifica- 
tion of  the  schematic  board  on  the  control  panel. 
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FIGURE  2.  EXAMPLE  OF  TYPICAL  SCHEMATIC  BOARD  CIRCUIT 


RESULTS  OF  EVALUATION 


The  results  of  the  analysis  of  achievement  for  the  experimental 
group  and  a  control  group  using  paper  and  pencil  programmed  text  with 
a  circuit  "trainer"  showed  no  significant  differences.    The  baseline 
measures  of  math  scor       reading  level  and  Block  1  test  scores  for 
each  group  were  not  statistically  different.    The  learning  and  per- 
formance measures  were  also  equivalent  for  both  groups,  e.g.,  the 
Block  2  test  scores,  the  retention  test  scores,  the  practical  perfor- 
mance test  scores,  and  the  time  taken  to  complete  the  module  were 
not  significantly  different. 


CONCLUSIONS 


The  equivalent  achievement  measures  and  positive  attitudes  of  the 
students  Indicate  a  potentially  very  satisfactory  teaching  mechanism. 
The  experimental  system  has  provided  Insight  into  many  of  the  contin- 
gencies of  hardware/software  Interfacing  and  Instruction  in  trouble- 
shooting by  a  performance  training  simulator.    Transfer  of  this  in- 
formation to  the  design  and  Implementation  of  additional  systems,  and 
more  complete  utilization  of  the  Information  generated  by  the  CAI 
capability  of  the  carrel  will  provide  the  Air  Force  and  the  training 
community  In  general  with  a  powerful  instructional  tool. 
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In  order  to  retain  the  Integrity  of  the  experimental  design  for 
measuring  the  effectiveness  of  carrel  versus  prog ranrmed  instruction, 
every  effort  was  made  to  hold  all  of  the  other  independent  variables 
constant*    As  a  result  of  this  constraint,  the  carrel  instruction 
closely  followed  the  existing  study  guide  and  lesson  plan.  However, 
the  plans  of  instruction  for  delivery  by  one  method  of  Instruction 
may  not  be  optimal  for  another  mode  of  del Ivery*    That  is,  alternative 
Instructional  strategies  may  be  more  appropriate  for  carrel  presenta- 
tion.   Methods  of  presentation  for  carrel  delivery  should  be  examined, 
since  the  CAf-slmulator  delivery  mechanism  may  be  enhanced  by  a  dif- 
ferent format,  order  of  material,  or  type  of  presentation. 
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ABSTRACT 

This  paper  discusses  a  method  presently  being  developed  for  evaluating 
operator  performance  on  a  Tactical  Operational  Simulator/  Trainer  (TOS/T). 

•  Quantifying  operator  performance  has  been  an  important  area  of 
research  for  several  years.    A  common  problem  for  many  of  these  perforaanc* 
studies  was  their  inability  to  duplicate  standard  scenarios  or  sets  of 
events.    Compounding  the  problem  was  their  inability  to  obtain  baseline 
measures  of  these  scenarios  while  the  system  operated  in  an  automatic  mode. 

Recent  technological  advances  have  made  it  possible,  however,  to 
duplicate  standard  scenarios,  and  to  execute  and  reexecute  them  on 
computer-driven  training  simulators.    These  scenarios  also  can  be 
executed  in  either  an  automatic  mode,  which  is  non-operator  dependent, 
or  a  semi-automatic  mode,  which  requires  some  operator  performance  in  a 
more  systematic  fashion. 

The  automatic  mode  capability  enables  researchers  to  establish 
baseline  measures  of  scenarios.    These  baseline  measures  then  become  the 
yardstick   for  evaluating  operator  performance  in  the  semi-automatic  mode. 
The  presentation  of  standard  scenarios  or  sets  of  events  permits  comparisons 
of  operator  performance  for  repeated  trials,  in  addition  to  comparisons 
among  operators.    Furthermore,  this  method  of  evaluating  operator  performance 
enables  developers  of  training  scenarios  to  produce  objective-oriented 
training  programs. 


1.    The  views  expressed  in  this  paper  are  those  of  the  author  and  do  not 
necessarily  reflect  the  views  of  the  Army  Research  Institute  or  the 
Department  of  the  Army. 


INTRODUCTION 


Past  research  regarding  operator  performance  on  radar  systems  has 
•  focused  on  man  s  ability  to  detect,  track  and  identify  targets,  and  make 
engagement  decisions.    The  research  has  been  conducted  in  various  envixon- 
ments;  real  and  simulated,  and  computer  assisted  and  non-computer  assisted. 
None  of  these  environments  though  have  permitted  researchers  to  capture 
the  sequence  of  system  actions  and  use  these  actions  as  a  baseline  for 
comparison  with  actual  operators'  actions.    In  an  automatic  mode,  however, 
system  software  could  be  used  to  generate  an  "ideal"  sequence  of  actions 
which  could  be  compared  to  operators'  actions.    Operators'  performance 
could  be  evaluated  by  noting  the  difference  between  the  system's  (model) 
actions  and  the  operators'  (actual)  actions. 

The  Army's  newest  major  air  defense  weapon  system,  PATRIOT,  is  "the 
first  truly  automated,  fully  software  driven  air  defense  weapon  systeml** 
Because  the  PATRIOT  system  software  is  truly  automatic,  it  can  be  used 
to  generate  an  "ideal"  sequence  of  actions  for  research  and  training 
purposes. 

The  purpose  of  this  paper  is  to  discuss  a  method 'presently  being 
developed  to  assess  and  evaluate  operator  performance  on  the  PATRIOT 
system  console.    The  research  and  training  will  be  performed  on  a 
Tactical  Operational  Simulator/Training  (TOS/T),  (see  Figure  1),  which 
will  permit  a  scenario  to  be  run  and  re-run  in  automatic  mode  and  will 
allow  comparisons  to  be  made  between  model  and  actual  actions. 

METHOD 

Several  tasks  will  have  to  be  completed  before  the  collection  and 
evaluation  of  operators'  performance  data  can  be  begun.    These  prerequisite 
tasks  include  the  following:  , 

1.  Developing  an  Automated  Training  Scenario  Generated  Program 
(ATSGP).    The  ATSGP  should  minimize  the  time  it  will  take  to 
develop  training  scenarios. 

2.  Designing  objective  based  scenarios.    These  scenarios  will  be 
used  to  train  console  operators  to  locate  and  use  the  console 
buttons,  i.e.,  the  operators  will  be  trained  to  determine 
where  the  buttons  are  and  how  they  elicit  various  system 
software  actions. 


5. 


Modifying  existing  system  simulation  software.  The  system 
simulator  software,  as  it  is  now,  will  have  to  be  modified 
before  system  data  or  operators*  actions  data  can  be  collected. 

Preparing  training  material.  A  set  of  training  materials  will 
be  prepared  to  train  console  operator  tasks. 

Validating  training  materials.  The  training  materials  will  be 
validated  before  actual  collection  of  the  data. 


Once  these  prerequisite  tasks  have  been  completed  the  collection  of 
^     data  will  begin.    Data  for  the  system  will  be  collected  in  real  time  for 
standard  scenarios.    Data  for  the  operators  will  be  collected  at  pre« 
1»lanned  Intervals  of  one-tenth  of  a  second  for  standard  scenarios.  Although 
the  data  will  be  collected  in  real  time,  it  will  be  evaluated  offline, 
because  the  additional  requirements  demanded  by  the  evaluation  procedure 
«iay  degrade  realtime  simulation  performance. 

The  data  to  be  collected  are  discussed  in  the  next  section  of  this 
fiapar. 

Proposed  Model 

The  variables  to  be  manipulated  in  the  proposed  model  include: 
The  number  of  threats  (TL  )  in  a  given  scenario  (see  Figure  2),  the 
number  of  resources  spent  per  threat,  and  the  effect  of  asset  value. 

.The  ratio  of  incapacitated  threats  (  X/Tl  )  yields  a  measure  of 
xrelatlve  fire  section  effectiveness  (EFF) .    Since  each  threat  may  attack 
assets,  the  EFF  measure  also  accounts  for  threats  that  penetrate.  Therefore 
the  product  of  the  ratio  of  incapacitated  threats  and  the  reciprocal  of 
penetrators  yields  a  refined  EFF  measure,  depicted  as: 


1 
p 


where  P  >  0. 

If  P  -  t),  howevPtT,  P  is  reset  to  1. 


To  slmpllty  this  reset  problem,  due  to  the  possibility  of  a  given 
threat  not  penetrating,  the  model  is  revised  as  shown. 

FFF=-— •  — 
•     ^    Tl  TP 

where  TP  »  1  +  P. 

A  given  set  of ^training  scenario  stimuli  may  result  in  a  battle  time 
of  only  twenty  s6doiAB.  Rattle  tid^  is  defined  as  the  difference  between 
the  time  the  first  wnicle  appears  on  the  screen  and  the  last  vehicle  moves 
off  the  screen.    The  EFF  measure  to  be  used  in  analyzing  training  scenarios 
requires  a  unit  of  time  for  system  response  to  threat  engagement  (te) 
status  and  a  unit  of  time  to  incapacitate  Cti)  the  threat.    This  amount  of 
time  (ti)  is  incorporated  into  the  model  as  the  reciprocal  of  threat 
time  (Vtt^*    Therefore,  the  model  takes  the  form; 
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EFF  =i  .-L 

^    Tl  TP  Ti 


wherej 


ti  -  tit  -  te 

.  yields  a  measure  of  EFF. 

The  number  of  resources  spent  per  threat  is  represented  in  the  model 
as  follows: 

1-^  Nr  -  Ni 

where  Nr  «  number  of  resources  spent,  and 

Ni  "  number  of  incapacitated  threats. 

To  incorporate  this  unit  measure  of  resource  effectiveness  into  the 
model  the  reciprocal  of  the  difference  of  number  of  re^^ources  spent  and 
numbisr  of  incapacitated  threats  plus  a  constant  of  one  is  multiplied  by 
the  product  of  the  incapacitated  threats,  threat  penatrators,  and  time  to 
incapacitate  threats  ratios.    Therefore,  the  model  assumes  the  form: 


r  i.  1.  1. 

Ti.  TP'rcRS 

where ^ 

RS»1  +  Nr  -  m 


The  third  variable  is  the  effect  of  asset  value  or  asset  threat 
weight  which  impacts  on  the  value  of  threat  penatrators  (TP)c 

A  preliminary  analytical  evaluation  will  beconducted  to  determine 
the  relationship  between  the  asset  threat  weight  and  TP.    In  order  to 
give  the  asset  threat  weight  some  value  the  following  assumption  is  made: 
D  is  equal  to  the  value  of  four  minus  the  asset  threat  weight,  where  the 
asset  threat  weight  assumes  the  values  of  one  to  three.    Th:^  model  now  is 
shown  as: 

'It  '1P(D)'TI'  R5 

The  model  is  designed  to  evaluate  one  threat. 

To  generalize  across  x  number  of  threats  an  EFF  value  representing 
the  average  of  the  sum  across  x  numbfer  of  threats  is  produced.    The  final 
form  of  the  proposed  model  is: 

132- 
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■  f.l  Ltl  TP(D)'  Tc  RBJ 

In  siumoary,  the  model  provides  the  researcher  with  a  means  for 
determining  the  system's  performance  on  a  given  scenario.    The  system's 
effectiveness  measure  is  used  as  the  "yardstick"  or  baseline  for  evaluating 
operators*  performance  on  the  scenario.    Operators*  (actual)  actions  are 
then  evaluated  against  the  system's  (model)  actions.    An  evaluation  of 
mdoel  versus  actual  performance  permits  review  and  analysis  of  operators' 
strengths  and  weaknesses.    Subsequently,  training  material  can  be  developed 
to  z6move  noted  operators*  weaknesses. 

The  proposed  model  is  only  In  its  developmental  stages.  Numerous 
revisions  and  refinements  are  anticipated.    The  evolving  nature  of  the 
medal  will  reflect  enhancements  that  improve  both  the  validity  and  reliability 
of  the  effectiveness  measure. 

GENERALIZABILITY  AND  FUTURE  RESEARCH 

The  methodology  discussed  in  this  paper  is  being  developed  specifically 
for  the  PATRIOT  Air  Defense  Weapon  System.    It  is  anticipated,  however, 
that  this  methodology  of  evaluating  operator  performance  be  expanded  to 
encompass  additional  systems  requiring  operator  performance  on  radar  systems 
such  as  those  found  in  the  Army,  Air  Force,  Navy,  Marines,  and  Coast  Guard. 
Generalizing  the  methodology  to  the  Federal  Aviation  Association's  training 
of  Air  Traffic  Controllers  also  is  envisioned. 

This  methodology,  although  in  its  preliminary  stages  of  development, 
offers  promising  possibilities  for  the  evaluation  of  operator  performance. 
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CRITICAL  PERFORMANCES  OF  BATTALION  COMMAND  GROUPS* 
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Army  Research  Institute  for  the  Behavioral  and  Social  Sciences 
Field  Unit,  Fort  Leavenworth,  Kansas 


Abstract 

The  behavior  of  23  battalion  connnand  groups  was  investigated  in  a 
simulated  combat  environment  provided  by  the  Comb-^ned  Arms  Tactical 
Training  Simulator  (CATTS) .    Thirteen  mechanized  groups  performed  a 
covering  force  operation  followed  by  an  attack,  and  ten  non-mechanized 
groups  performed  a  defense  and  an  attack.    Their  performance  was  rated 
on  Items  derived  from  the  subtasks  of  the  battalion  command  group  ARTEP 
(Army  Training  and  Evaluation  Program).    Fifteen  subtasks  were  identi- 
fied as  critical,  because  they  or  their  elements  were  both  low-rated 
and  highly  correlated  with  ratings  of  overall  effectiveness. 

The  four  missions  observed  in  this  investigation  were  markedly 
different  with  respect  to  subtask  criticality.    All  but  one  of  the  15 
critical  subtasks  were  identified  in  the  covering  force  mission,  five 
subtasks  were  critical  in  the  mechanized  attack,  one  in  the  defense, 
and  one  in  the  non-mi»chanized  attack. 

Rater  reliability  was  low.    The  coefficient  of  reliability  was  only 
.22  for  scores  from  a  single  rater.    It  increased  to  .55  when  the  scores 
from  four  or  five  raters  were  averaged.    The  differences  among  ratings 
of  the  same  command  group  by  different  observers  were  significant  beyond 
the  .001  level.    These  results  indicate  a  need  for  further  research  to 
develop  more  objective  measures  of  command  group  performance. 


.*The  views  expressed  herein  are  the  authors'  and  are  not  necessarily 
endorsed  by  the  U.S.  Army. 

I 
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INTRODUCTION 


BACKGROUND 

In  recent  years ,  time  and  resource  constraints  have  provided 
Increased  impetus  for  the  development  of  more  efficient  military  train- 
ing systems.    The  Army  Training  and  Evaluation  Program  (ARTE?)  and  the 
various  battle  simulations  are  prime  examples  of  such  systems. 

The  Combined  Arms  Training  Developments  Activity  (CATRADA)  at  Fort 
Leavenworth,  Kansas  is  responsible  for  developing  ARTEPs  and  battle 
simtilatlons  for  battalion,  brigade,  and  division  command  groups.-'*  At 
the  battalion  level,  CATRADA  has  developed  the  Command  Group/Staff 
Module  of  ARTEP  71-2,2  yjj^ch  specifies  the  training  objectives  for  the 
battalion  commander  and  his  staff.    In  addition,  CATRADA  is  developing 
four  different  battle  simulations  for  training  battalion  command  groups. 
Each  simulation  has  its  own  unique  capabilities  and  limitations.  Pegasus 
is  a  manual  control  system  for  battalion  and  brigade  CPX's  (command  post 
exercises).    BATTLE  (Battalion  Analyzer  and  Tactical  Trainer  for  Local 
Engagements)  is  played  on  a  terrain  board  with  the  aid  of  a  mini-computer. 
CAMMS  (Computer  Assisted  Map  Maneuver  System)  exercises  battalion,  or 
brigade  and  battalion,  command  groups  via  terminals  linked  by  telephone 
to  a  large,  time-shared  computer.    CATTS  (Combined  Arms  Tactical  Training 
Simulator)  is  the  most  realistic  and  the  most  completely  automated  battle 
simulation  available  for  training  battalion  command  groups.    It  is  sup- 
ported by  a  large,  dedicated  computer  and  a  full-time  controller  staff. 
CATTS  is  permanently  located  at  Fort  Leavenworth,  but  a  remote  version 
is  being  developed  that  will  be  able  to  provide  exercises  at  a  unit's 
home  station.    These  battle  simulations  and  the  command  group  ARTEP  are 
subsystems  within  a  larger  system  for  training  battalion  commanders  and 
their  staffs.    Courses  taught  in  Army  schools,  and  CPX's  and  field  exer- 
cises conducted  by  the  units  themselves  are  also  elements  of  the  command 
group  training  system. 

The  systems  approach  t^  training,  as  described  in  the  Instructional 
Systems  Development  Model, ^  is  the  approved  methodology  to  be  followed 


-■•Battle  Simulations  and  the  ARTEP.  Combined  Arms  Training  Develop- 
ments Activity,  Ft  Leavenworth,  KB,  18-461,  November  1977. 

2 

Army  Training  and  Evaluation  Program  fcr  Mechanized  Infantry/Tank 
Task  Force,  No.  71-2,  Headnuarters,  Department  of  the  Army,  Washington, 
D.C.,  17  June  1977. 

3 

Interservlce  Procedures  for  Instructional  Systems  Development. 
TRADOC  Pamphlet  350-30.    Fort  Monroe,  VA:    U.S«  Army  Training  and  Doc- 
trine Command,  1975. 
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in  the  development  of  military  training  systems.    A. simple  outline  of 
the  systems  approach,  similar  to  Eckstraud*s  model,    will  serve  to  place 
the  present  research  in  the  context  of  a  systems  approach  to  training 
development. 

The  development  of  a  training  system  can  be  described  as  a  seven 
stage  process: 

1.  Define  the  training  objectives. 

2.  Develop  measures  of  performance. 

3.  Derive  the  training  content. 

4.  Design  training  methods  and  materials. 

5.  Conduct  training. 

6.  Evaluate  trainee  performance. 

7.  Provide  feedback  to  modify  content,  methods  and  materials. 

The  relationships  among  these  stages  are  diagrammed  in  the  flow  chart  in 
Figure  1,  and  their  relevance  to  the  development  of  command  group  train- 
ing is  elaborated  below. 

The  Command  Group/Staff  Module  of  the  ARTEP  defines  the  objectiver 
of  the  battalion  command  group  training  system.    The  module  comprises 
12  tasks,  which  are  broken  down  into  a  total  of  61  subtasks,  a  brief 
statement  of  the  conditions  under  which  each  task  and  subtask  is  per- 
formed, and  a  general  description  of  the  performance  standards  for  each 
task  and  subtask.    The  tasks  include  such  actions  as  Develop  a  plan 
based  on  mission.  Prepare  and  organize  the  battlefield.  See  the  battle- 
field during  the  battle,  and  Concentrate/shift  combat  power. 

The  measurement  instruments  developed  for  this  research  ara  ques- 
tionnaires answered  by  experienced  evaluators  who  observe  the  command 
group's  behavior  in  a  simulated  combat  environment.    The  questionnaire 
items  were  derived  from  the  command  group  ARTEP  and  from  previous  research 
on  the  performance  of  battalion  command  groups  in  simulated  combat.^ 


^Eckstrand,  G.  A.    Current  Status  of  the  Technology  of  Training. 
AMRL  Document,  Technical  Report  64-86,  September  1964. 

^Barber,  H.  F.  and  Kaplan,  I.  T.  Battalion  Command  Group  Performance 
in  Simulated  Combat.    ART  Technical  Paper.    In  press. 
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Figure  1.    A  systems  approacn  Co  training  development. 
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The  training  content  should  al-^o  be  derived  from  the  objectives, 
because  it  should  be  directed  tf.\?ard  satisfying  those  objectives.  In 
the  case  of  verbal  material,  the  content  is  expressed  in  the  topics, 
subjects  and  substance  of  books,  lectures  and  courses.    For  battle 
simulations,  the  content  is  specified  by  the  exercise  scenarios,  which 
determine  the  terrain  on  which  the  simulated  battle  is  fought,  the  mis- 
sion that  the  command  group  is  assigned,  and  the  ARTEP  tasks  and  sub- 
tasks  that  must  be  performed  to  accomplish  the  mission. 

The  methods  and  materials  for  training  battalion  commanders  and 
staffs  include  lectures,  field  manuals,  exercises,  and  battle  simula- 
tions.   These  simulations  are  used  both  to  conduct  training  and,  with 
the  aid  of  performance  measures  based  on  the  ARTEP,  to  evaluate  the 
effectiveness  of  previous  training.    The  feedback  that  an  individual 
command  group  receives  about  its  weaknesses  enables  it  to  modify  its 
own  training  program  to  address  those  VTe<?kr;esses.    Knowledge  of  the 
weaknesses  common  to  many  command  groups  eiiables  the  Army  to  improve 
the  total  trainiug  system. 


PURPOSE 

The  present  investigation  was  concerned  with  the  second  and  seventh 
steps  in  the  system  development  process:    deriving  performance  measures 
from  tb^      fining  objectives,  and  providing  feedback  to  Improve  training 
content y  v^chods  and  materials.    This  effort  contributed  directly  to 
two  aima  of  'he  ARI  Field  Unites  research  on  command  and  control  train- 
ing:   to  identify  critical  command  group  performance  requirements  at 
battalion  and  higher  s^ommand  levels,  and  to  develop  ways  of  measuring 
these  performances  through  the  use  of  battle  simulations.    More  spe- 
cifically, the  purposes  of  this  investigation  were  as  follows: 

1.  To  develop  a  battery  of  performance  measures  that  can  be  used 

to  evaluate  (a)  specific  command-group  performances,  and  (b)  the  overall 
effectiveness  of  individual  staff  members  and  the  command  group  as  a 
whole. 

2.  To  identify  the  command  group  performances  that  are  most  impor- 
tant for  tr .'ning  in  terms  of  (a)  low  performance  ratings  and  (b)  high 
correlatior:3  with  overall  effectiveness  ratings.    In  other  words,  to 
identify  those  performances  that  are  deficient  and/or  strongly  related 
to  overall  effectiveness - 
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METHOD 


BATTALION  COMMAND  GROUPS 

Data  were  collected  from  13  mechanized  and  10  non-mechanized  units 
stationed  in  the  Continental  United  States.    The  mechanized  units 
included  five  infantry,  three  armor  and  five  cavalry  battalions.  The 
non-mechanized  units  were  all  infantry  battalions.    Three  mechanized 
infantry  battalions  were  National  Guard  units,  and  all  the  other  units 
were  Active  Army,  as  shown  in  Table  1. 

The  battalion  command  group  typically  comprised  the  commander,  SI, 
S2,  S3,  S4,  an  air  liaison  officer  (ALO),  a  fire  support  coordinator, 
an  operations  sergeant,  intelligence  sergeant,  assistant  S2  and/or  S3 
air,  fire  support  NCO,  and  one  or  two  radio/ telephone  operators. 


EXERCISES 

Each  command  group  was  observed  during  the  performance  of  two  mis- 
sions in  the  CATTS  facility  at  Fort  Leavenworth.    The  particular  missions 
assigned  to  a  group  depended  on  the  type  of  unit  it  commanded,  as  shown 
in  Table  2.    Mechanized  units  performed 'a  covering  force  operation  fol- 
lowed by  a  daylight  attack  as  part  of  a  larger  force.  Non-mechanized 
units  first  performed  a  defense  and  then  a  non-illuminated,  non-supported 
night  attack.    Differences  in  mobility  and  probable  real-world  missions 
determined  the  types  of  missions  assigned.    Active  Army  groups  conducted 
their  two  missions  during  a  three-day  exercise.    National  Guard  groups 
performed  one  mission  per  day  during  a  two-day  weekend  exercise. 

SIMULATION  SYSTEM 

The  battlefield  environment  was  simulated  by  the  Combined  Arms 
Tactical  Training  Simulator  (CATTS),  which  provides  a  computer-driven 
exercise  to  train  maneuver-battalion  commanders  and  their  staffs  in  the 
control  and  coordination  of  c"5?bined-arms  operations.    It  simulated  the 
actions  of  units  in  combat,  moves  elements  on  and  above  the  battlefield, 
calculates  intervisibility  and  detection  between  forces,  weapon-to^ 
target  ranges  and  the  effects  of  weapons  employment,  and  it  maintains 
the  status  of  personnel,  equipment,  ammunition  and  fuel  for  friendly  and 
enemy  forces.    Speed  of  movement 5  line  of  sight,  and  weapons  effects  ato. 
affected  by  changes  In  weather,  terrain  contour  and  soil  tjrpe,  suppres- 
sive fires,  and  personnel  and  equipment  status.    The  CATTS  exercise  is 


1269 


TABLE  1 
Number  of  Command  Groups 


Type  of  Unit  Active  Army  National  Guard 

Mechanized 

Infantry  2  3 

Armor  3  0 

Cavalry  5  0 

Non--mechanlz  ed 

Infantry  10  0 


TABLE  2 
Type  of  Mission 

Type  of  Unit  Mission  1  Mission  2 

Mechanized  Covering  Force  Attack 

Non-mechanized  Defense  Attack 
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conducted  in  a  real-time,  free-play  mode.    Within  the  prescribed  tacti- 
cal situation,  the  battalion  commander  can  employ  his  assets  in  any 
manner  he  deems  appropriate.    The  only  constraints  are  the  assets  avail- 
able to  the  battalion  and  the  actions  of  the  enemy  commander. 

In  this  research,  the  command  group  occupied  a  simulated  tactical 
operations  center  (TOC) ,  except  for  the  SI  and  S4  who  were  in  another 
area  designated  as  the  combat  trains.    The  players  (the  battalion  com- 
mand group)  in  both  areas  were  provided  with  communications  equipment 
normally  found  in  a  maneuver  battalion^    They  could  communicate  with 
higher,  lower  and  adjacent  units  (played  by  controllers)  in  any  manner 
consistent  with  Army  procedure  and  with  the  simulated  location  of  the 
various  units:    face  to  face,  by  telephone  or  radio,  and  by  wricten 
message.    Figure  2 ' illustrates  the  communications  among  the  players, 
the  controllers  and  the  computer.    Most  communication  took  place  over 
radio  and  telephone.    The  battalion  command  group  had  seven  radic  riets 
(actually  hard-wired)  with  appropriate'' alternate  frequencies.    The  nets 
included  the  brigade  command  net,  the  brigade  intelligence  net,  the 
brigade  administration/logistics  net,  the  battalion  command  net,  the 
fire  support  net,  and  the  air  support  net.    In  addition  to  the  radio 
nets,  the  command  group  also  had  a  RATT  (radio-teletype)  unit  and  field 
tel<^;phone8,  when  appropriate.    The  sounds  of  '^nemy  jamming,  battle,  and 
engine  and  generator  noise  were  generated  during  the  exercise  to  add  to 
the  realism  of  the  experience. 

CONTROLLERS 

A  team  of  controllers,  permanently  assigned  to  CATTS,  mediated 
between  the  players  and  the  computer.    The  control  group  included  a 
chief  controller,  who  played  the  role  of  brigade  commander,  a  brigade 
S1/S4  controller,  who  also  played  the  roles  of  service-sii?port-unit  com- 
manders and  executive  officers,  a  brigade  S2/S3  controller »  four  maneu- 
ver- and  supporting-unit  commanders,  a  fire  support  contro.ler,  two 
forward  observers,  a  direct  air  support  controller  (DASC),  and  a  threat 
controller.    The  DASC  -^as  played  by  a  different  Air  Force  officer  each 
time.    The  monitor  was  an  adjunct  member  of  the  control  group,  who 
observed  the  command  group  during  the  exercise  and  provided  feedback  to 
the  players  during  the  post-game  critique.    This  position  vas  a  rotating 
assignment  among  faculty  members  of  the  Command  and  General  Staff  College 
who  had  served  as  battalion  commanders  or  staff  members  and  held  the  rank 
of  lieutenant  colonel. 

The  members  of  the  control  group  are  listed  in  Table  3.    Three  con- 
trollers, identified  as  interactors,  input  orders  to  the  computer  through 
a  control  console:     (1)  the  command  and  control  Interactor  relayed  orders 
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Brigade 
Level 


Brigade  Commander 

S1/S4  CAdi^lnlstratlon/Loglstlcs) 

S2/S3  (Intelligence/Operations) 

FSE  (Fire  Support  Element) 

DASC  (Direct  Air  Support  Center) 




Orders  and  Information 


Battalion 
Level 


Information  and  Requests 


Battalion  Commander 

51  (Administration) 

52  (Intelligence) 

53  (Operations) 
SA  (LoglPtlcs) 

FSCOORD  (Fire  Support  Coordln^stor) 
ALO  (Air  Liaison  Officer) 
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J  Units 


Information  and  Requests 


Company/ 
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Computer 
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Instructions 
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Heglmental  Level 


I  Enemy  Commander  j 


Figure  2.  Commualcation  between  con^iroller  and  player  positions  in  CATTS. 
Controller  positions  are  Inclosed  by  br.'^ken  lines. 
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from  the  battalion  command  group  to  the  maneuver  units  modeled  in  the 
computer,  (2):  the  fire  support  interactor  input  orders  to  the  artillery 
and  air  support  units,  and  (3)  the  threat  interactor,  working  independ- 
ently, controlled  the  enemy  force.    The  results  of  simulated  movements 
and  engagements  were  displayed  on  television  screens  to  the  controllers, 
who  transmitted  relevant  information  to  the  players  via  radio  or  telephone. 
Except  for  the  threat  interactor  and  the  monitor,  all  controllers  acted 
the  roles  of  higher,  lower  or  adjacent  unit  personnel.    In  addition  to 
their  other  functions,  eight  controllers  also  filled  out  observation  forms 
on  which  they  evaluated  the  command  group's  performance  in  the  areas  des- 
ignated in  Table  3. 


PERFORMANCE  MEASURES 

The  documentation  of  the  observation  forms  that  were  used  to  evaluate 
the  performance  of  battalion  command  groups  in  simulated  combat  is  one  of 
the  principal  objectives  of  this  report.    These  forms  constitute  a  source 
of  test  items  that  can  be  drawn  upon  by  exercise  directors  or  battalion 
and  brigade  commanders.    The  items  can  also  be  used  by  researchers  to 
study  command  group  training  and  perfoxmance.    While  still  subject  to 
refinement,  especially  with  respect  to  increased  objectivity,  these 
observation  forms  were  superior  to  those  previously  employed.    The  major 
Improvements,  which  were  based  on  extensive  experience  using  the  earlier 
forms,  were  (1)  the  introduction  of  a  five-point  scale  for  rating  the 
performance  of  ARTEP  subtasks  and  (2)  the  addition  of  many  specific 
questions  that  were  answered  yes  or  no.    The  five-point  rating  scale 
permitted  finer  discriminations  than  the  three-point  scale  previously 
used,  and  the  yes/no  questions  provided  more  detailed  information  about 
the  components  or  elemerics  of  sub  task  performance. 

Four  different  observation  forms  were  filled  out  by  the  evaluators: 

1.  A  form  concerned  ^-jlth  administration  and  logistics  was  completed 
by  the  brigade  S1/S4  ccv:  ?/-.^.ller . 

2.  Intelligence  and  operations  forms  were  completed  by  the  brigade 
S2/S3  controller  and  by  the  controllers  who  played  company  commanders. 

3.  A  :ire  support  form  was  completed  by  the  fire  support  controller. 

A.  An  observation  form  covering  all  L  le  preceding  areas,  in  somewhat 
less  detail,  was  completed  by  the  monitor. 

Each  observation  foriu  had  two  or  three  vj^rsions,  appropriate  to  the 
mission  that  was  played.    The  S1/S4,  FS  anu  mriitor*s  forms  had  one 
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TABLE  3 
CATTS  Control  Group 


-  081 c ion 

Rank 

Performance  Observed 

unier  Lontroiier 

LTC 

origaae  oX/o4 

nAJ 

Administration  and  Logistics 

Drlgade 

MAJ 

Intelligence  and  Operations 

Command  and  Control  Interactor 

CPT 

Intelligence  and  Operations 

Company  Controller 

MAJ 

Intelligence  and  Operations 

Company  Controller 

CPT 

Intelligence  and  Operations 

Company  Controller 

CPT 

Intelligence  and  Operations 

Unit  First  Sergeants 

SSG 

Fire  Support  Interactor 

CPT 

- 

Artillery  Controller 

LTC 

Fire  Support 

Artillery  Controller 

CPT 

Artillery  Controller 

SSG 

Division  Air  Support  Center 

CPT  to 
LTC 

Threat  Interactor 

CPT 

Monitor 

LTC 

ARTEP  Sub tasks 
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version  for  the  covering  force  or  defense  and  another  version  for  the 
mechanized  or  non-mechanized  attack.  The  intelligence  and  operations 
form  had  one  version  for  the  covering  force,  another  for  the  defense, 
and  a  third  for  the  mechanized  or  non-mechanized  attack.  Most  of  the 
items  on  a  given  form  were  the  same  in  both  or  all  three  versions,  but 
some  items  were  unique  to  the  mission. 

The  performances  evaluated  by  the  brigade  S1/S4  controller  included 
subtasks  and  elements  of  subtasks  corncerned  with  providing  supplies  and 
maintaining  equipment  before  and  during  the  battle,  (Subtasks  3J,  3K, 
9A  and  9B),  supporting  the  troops  (9C) ,  and  integrating  combat  service 
support  (CSS)  into  the  scheme  of  maneuver  (9D) .    The  S1/S4  controller 
also  rated  the  overall  effectiveness  of  the  battalion  SI  and  S4  in  com- 
parison with  those  of  previous  command  groups. 

The  brigade  S2/S3  controller  and  the  company  commanders  evaluated 
subtasks  and  elements  concerned  wtih  intelligence  preparation  of  the 
battlefield  (IB,  2A,  2B,  2C,  2D),  analyzing  friendly  capabilities  (ID), 
selecting  routes  and  positions  (IF,  IG,  IH) ,  organizing  for  combat  (3C) , 
communicating  plans  and  orders  (3D,  3F,  3G) ,  seeing  the  battlefield 
during  the  battle  (5P    5C,  5D)  ,  troop  leading  during  the  battle  (llA) , 
communicating  changes  (6B) ,  concentrating  combat  power  (9A,  8B,  8C,  8D) , 
and  reacting  to  enemy  electronic  warfare  (lOA,  12A) .    They  also  rated 
the  overall  effectiveness  of  the  battalion  S2,  S3  and  the  command  group 
as  a  whole  in  compariison  with  previous  S2's,  S3's  and  command  groups. 

The  fire  support  controller  evaluated  the  fire  support  plan  (ll), 
priority  of  fires  (IJ) ,  fire  support  coordination  (IL),  modification  of 
the  fire  support  plan  during  the  battle  (7A) ,  and  the  overall  effective- 
ness of  the  battalion  fire  support  element  in  comparison  with  i*:hat  of 
previous  groups. 

The  monitor  evaluated  all  the  above  types  of  performance  by  rating 
subtasks  of  the  command  group  ARTEP.    the  only  subtasks  he  did  not  rate 
were  those  that  were  not  played  in  the  exercise  and  those  he  could  not 
observe.    He  also  evaluated  the  degree  to  which  the  mission  was  accom- 
plished, and  the  overall  effectiveness  of  the  battalion  commander  and 
the  command  group  as  a  whole.    Since  the  monitor  did  not  have  as  much 
experience  observing  command  groups  as  the  permanent  controllers,  he 
did  not  rate  them  in  comparison  with  previous  groups,  but  used  the  five- 
point  performance  scale  instead. 

The  scales  that  were  used  to  rate  performance,  overall  effectiveness 
and  mission  accomplishment  are  listed  be'^ow: 
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The  alternative  responses  on  the  five-point  performance  scale  were 
defined  as  follows: 

1  -  Completely  overlooked,  forgotten 

2  -  Major  deficiencies 

3  -  Minor  deficiencies 

4  -  Satisfactory 

5  -  Excellent 

The  overall  effectiveness  of  the  command  group  as  a  whole  and  of  Its 
Individual  members  were  rated  In  comparison  with  previous  command  groups 
and  their  members  on  the  following  scale: 

1  -  One  of  the  worst 

2  -  Worse  than  average 

3  -  Average 

4  -  Better  than  average 

5  -  One  of  the  best 

The  alternative  responses  for  the  monitor's  evaluation  of  mission 
accomplishment  were: 

1  -  Failed  to  acconq>llsh  any  part  of  the  mission 

2  -  Failed  to  accomplish  most  of  the  mission 

3  -  Accomplished  about  half  of  the  mission 

4  -  Accomplished  most  of  the  mission 

5  -  Accomplished  all  of  the  mission 


DATA  ANALYSIS 

The  primary  objectives  of  the  data  analysis  were  to  identify  those 
performances  that  were  deficient  and  those  that  were  highly  correlated 
with  ratings  of  overall  effectiveness.    Performances  were  designated  as 
deficient  when  their  average  scores  were  less  than  or  equal  to  the  mid- 
point of  the  rating  scale.    For  performances  rated  on  a  five-point 
scale,  the  criterion  score  was  3.0.    For  Items  answered  yes  or  no,  the 
criterion  score  was  50%  yes.    Most  of  the  yes/no  items  were  worded  so 
that  a  yes  response  signified  a  correct  performance  in  the  rater's 
judgement,  but  a  few  questions  were  phrased  so  the  proper  behavior  was 
indicated  by  a  no  response.    Thus,  the  general  defrlnition  of  deficient 
performance  for  yes/no  items  was  an  average  score  less  than  or  equal  to 
50%  correct. 

A  correlation  was  considered  high  when  it  was  statistically  signifi- 
cant at  the  .01  level  by  the  one-tailed  test*    Pearson  r's  were  computed 
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for  the  scaled  items  and  point  biserlal  correlations  for  the  yee/no  Items. 
The  .01  criterion  of  significance  was  chosen  In  preference  to  the  less 
stringent  .05  level,  because  of  the  large  number  of  correlation  coeffi- 
cients (over  2,000)  that  were  computed.    About  100  would  be  significant 
by  chance  at  the  .05  level,  versus  about  20  at  the  .01  level.    The  one- 
tailed  test  was  used  because  only  a  positive  relationship  was  expected 
between  correct  performance  and  overall  effectiveness. 

One  problem  with  using  correlation  as  a  measure  of  crlticallty  is 
that  the  size  of  the  correlation  between  two  variables  decreases  when 
the  range  of  either  variable  is  rest ^ ^  ted.^    For  example,  if  every  S4 
determined  the  status  of  equipment  before  the  battle,  there  would  be  no 
correlation  between  performance  of  this  task  and  the  S4*s  overall  effec- 
tiveness ratings,  even  though  failure  to  perform  the  task  might  have 
harmful  consequences.    In  fact,  however,  lack  of  variation  was  not  a 
serious  problem  in  this  study,  because  the  typical  case  of  restricted 
range  occurred  when  the  task  was  usually  performed  correctly.    It  can 
be  argued  that  a  task  which  is  performed  correctly  by  a  given  population 
is  not  Important  for  training  in  that  population.    Scandura^  made  a 
related  point,  when  he  wrote  that  professional  competence  does  not  have 
to  be  analyzed  to  the  level  of  elementary  skills.    For  example,  all 
accountants  can  add,  so  arithmetic  ability  is  not  an  important  varla'^le 
for  distinguishing  among  individuals  within  the  popu" atlon  of  trained 
accountants.     Similarly,  the  tasks  that  were  usually  performed  correctly 
in  the  present  investigation  are  not  critical  for  training  incumbent 
command  groups.    This  argument  is  consistent  with  the  ARTE?  philosophy, 
which  advocates  training  to  correct  deficiencies. 

The  data  for  each  of  the  four  missions  were  analyzed  separately. 
First,,  mean  scores  were  calculated  for  the  items  that  were  rated  by 
several  observers,  i.e.,  the  intelligence  and  operations  items  that  were 
rated  by  the  brigade  S2/S3  and  the  company  commanders,  the  overall  effec- 
tiveness ratings  for  the  battalion  S2  and  S3  rated  b)  the  same  controllers, 
and  the  overall  effectiveness  ratings  for  the  whole  command  group  pro- 
vided by  the  same  controllers  plus  the  monitor.    These  ratings  were  aver- 
aged over  observers  to  obtain  a  mean  score  for  the  command  group  on  each 
item.    Then  the  scores  for  every  command  group  on  all  four  observation 
forms  (administration  and  logistics,  intelligence  and  operations,  fire 


^elkowltz,  J.,  Ewen,  R.B.,  and  Cohen,  J.  Int:  ductory  Statistics 
for  the  Behavioral  Sciences,  2d  Ed.    New  York:    Academic  Press,  1976. 

^Scandura,  J.M.    Structural  approach  to  instructional  problems. 
American  Psychologist,  1977,  32,  33-54. 
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support,  and  the  monitor's  form)  were  analyzed  to  produce  the  two  desired 
measures  of  performance  for  each  item; 

1.  Tlie  mean  rating  averaged  over  all  the  command  groups  that  played 
a  given  mission. 

2.  The  correlations  between  performance  on  the  item  an<?  the  overall 
effectiveness  ratings,  calculated  across  all  the  command  groups  that 
played  a  given  mission. 

In  addition  to  the  analysis  of  ratings  and  correlations  for  every 
item  on  the  observation  forms,  analyses  of  variance  were  performed  to 
compare  the  relative  difficulty  of  the  four  different  missions,  and 
measures  of  inter-rater  reliability  were  computed  for  the  items  that 
were  rated  by  several  observers. 


RESULTS 

The  results  of  this  investigation  are  described  below  under  three 
major  headings;     (1)  performance  deficiencies,  (2)  performances  corre- 
lated with  overall  effectiveness,  and  (3)  performances  that  were  both 
deficient  and  correlated  with  overall  effectiveness.    Under  each  heading, 
four  sets  of  command  group  performances  are  considered  in  turn:     (1)  ad- 
ministration and  logistics,  (2)  intelligence  and  operations,  (3)  fire 
support  and  (4)  the  subtasks  rated  by  the  monitor.    Within  each  of  these 
sets  J,  the  results  are  presented  for  each  of  the  four  missions:  (1)  cover- 
ing force,  (2)  mechanized  attack,  (3)  defense  and  (4)  non-mechanized 
attack. 


PERFORMANCE  DEFICIENCIES 

A  given  performance  was  considered  deficient  when  it  was  rated 
incorrect  for  50%  or  more  of  the  command  groups  -  in  the  case  of  items 
answered         or  no,  or  when  its  average  score  was  less  than  or  equal  to 
3.0  -  in  the  case  of  items  rated  on  a  five-point  scale.    All  the  defi- 
cient performances  are  listed  in  Tables  4  through  7. 

A  performance  that  was  deficient  in  one  mission  was  not  necessarily 
deficient  in  another.    To  make  the  deficient  performances  stand  out 
more  clearly,  satisfactory  performance  scores  are  not  included  in  Tables 
4  through  7.    The  mean  score  for  each  item  is  entered  in  the  column  that 
corresponds  to  the  mission  in  which  the  performance  was  deficient.  The 
entries  with  decimal  points  are  mean  scores  on  the  five-point  scale. 
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Percentage  entries  indicate  the  p^^rcent  of  yes  responses  to  items  that 
were  answerer?  yes  or  no.    The  entry  N/A  (not  applicable)  means  that 
there  is  no  score  because  the  item  did  not  appear  on  the  observation 
form  fcr  that  mission. 

Items  that  correspond  to  iiRTEP  subtasks  in  Tables  4  through  6  are 
identified  by  the  subtask  label  in  parenthesis  after  the  ite*;.  The 
elements  of  a  subtask  are  listed  before  it*    When  elements  oi  ^  subtask 
were  deficient,  but  the  overall  subtask  was  not,  the  subtask  label  is 
given  before  the  elements. 

Administration  and  Logistics,    Table  4  shows  that  the  deficiencies 
in  administration  and  logistics  had  to  do  with  providing  supplies  and 
equipment  during  the  battle  (Subtask  9A) ,  and  integrating  combat  service 
support  into  the  scheme  of  maneuvc^r  (Subtask  9D)  ♦    Four  performances 
were  deficient  in  the  covering  force  operation  (Items  6d,  6e,  9b  and  9c 
on  the  observation  form),  none  in  the  mechanized  attack,  one  in  the 
defense  (6d),  and  one  in  the  non-mechanized  attack  (6g) .  Obviously, 
the  specific  deficiency  in  the  use  of  transportation  assets  (9b)  con- 
tributed to  the  more  general  weakness  in  the  integration  of  CSS  into 
the  scheme  of  maneuver  (9c)  in  the  covering  force. 

Intelligence  and  Operations.    All  the  items  that  were  deficient 
according  to  the  average  latings  of  the  S2/S3  and  company  commander 
controllers  are  listed  in  Table  5.    There  were  iaficienciec  in  eight 
of  the  ten  categories  of  items;  the  two  exceptions  were:    B.  Friendly 
considerations  and,  C.    Organize  for  combat.    Because  the  items  within 
each  category  varied  from  one  mission  to  another,  a  given  item  on  the 
observation  form  for  one  mission  generally  had  a  different  number  on 
another  form.    For  ease  of  reference,  therefore,  the  items  In  Table  5 
have  been  renumbered  consecutively  within  each  category. 

Six  items  were  deficient  in  at  least  three  of  the  four  missions: 

A7.     The  intelligence  collection  plan  was  not  properly  prepared. 

61.    The  command  group  sometimes  made  unwarranted  assumption  that 
all  team  coconanders  were  monitoring  their  radios  for  changes.  (This 
was  an  instance  where  a  yes  response  Indicated  the  incorrect  behavior.) 

11.  There  was  too  much  radio  communication.     (Another  case  where 
yes  meant  wrong . ) 

12.  There  were  security  violations  during  radio  traffic.     (A  :*hird 
such  case.) 
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TABLE  4 

Deficiencies  in  Administration  and  Logistics- 


Item 

Mechanized 
CFO     1  Attack 

Non-Mechanized 
Defense     |  Attack 

6,    In  providing  supplies  and 
ctquipment  to  arm  and  fuel  the 
system  during  the  battle;  (9A) 

d.    Did  the  S4  coordinate  with 
the  S2  so  he  knew  the  enemy's 
capabilities? 

50% 

33% 

e.    Did  the  S4  keep  his*  higher 
appropriately  informed  of  his 
activities? 

45% 

g.    Did  the  S4  effectively 
utiu.ize  his  direct  support  assets? 

- 

- 

9,    In  terms  of  integrating  CSS 
o  the  scheme  of  maneuver: 

b.    Were  transportation 
assets  used  to  fit  movement  of 
CSS  resources  to  the  scheme  of 
ma^'^tiver? 

46% 

c.    How  effectively  was  CSS 
Integrated  into  the  scheme  of 
maneuver?  (9D) 

3.00 

 1 

1280 


TABLE  5 

Deficiencies  in  Intelligence  and  Operations 


Mechanized 

Non-Mechanized 

,  Item 

Aim  i»iX\mWL 

A.    Intelligence  preparation  of 
the  battlefield 

1.    Was  the  enemy's  scheme  of 
maneuver  and  fire  support  Identi- 
fied? 

41% 

48% 

2.    Was  the  enemy's  ability 
to  attack  by  air  Identified? 

45% 

N/A 

46% 

N/A 

3.    Was  the  enemy's  nuclear 
capability  identified? 

32% 

N/A 

36% 

N/A 

4.    Was  the  enemy's  chemical 
capability  identified? 

39% 

N/A 

44% 

N/A 

5.    Overall,  how  veil  did  the 
coms&and  group  identify  critical 
combat  Information  and  intelli- 
gence?   (13,  2A) 

2.76 

2.97 

6.    Were  all  GSR  elements 
effectively  utilized? 

10% 

7.    Was  the  TF  intelligence 
collection  plan  properly  prepared, 
and  did  it  reflect  analysis  by  the 
battalion  S2  of  tasking  responsi- 
bilities? 

:.67 

2.47 

2.74 

2.53 

8.    Overall,  how  well  did  the 
command  group  determine  combat 
information  and  intelligence 
shortfalls  and  aggressively  gather 
ijif ormation  from  all  available/ 
appropriate  sources?  (2B) 

2.71 

2,73 

9.    Was  r«?jlev&nt  information 
from  higher  headquarters  and  adja- 
cent units  disseminated  to  comp>^ny 
commanders  (e.g.,  minefields)? 

2.85 

2.98 

N/A  -  Not  applicable. 


TABLE  5  (Continued) 
Deficiencies  in  Intelligence  and  Operations 


Mechanized 

Non-Mechanized 

CFO 

Attack 

Defense 

Attack 

10.    Were  company  commanders 
given  an  estimate  of  specifically 
what  they  would  be  facing? 

36% 

- 

- 

- 

11.    Overall,  did  the  command 
group  disseminate  combat  informa- 
tion and  intelligence  that  was 
event-oriented  and  usable  to  the 
recipient?  (2D) 

2.87 

- 

- 

- 

D.    Communicate/ coordinate  plans 
and  orders. 

1.    Were  company  commanders 
given  instructions  on  actions  to 
be  performed  if  jamming  occurs? 

49% 

2.    Were  effective  alternate 
means  of  communication  developed 
In  case  of  lost  commo? 

32% 

44% 

3.    Was  wire  utilized  as  an 
effective  means  of  communication? 

N/A 

35% 

N/A 

4.    Did  the  command  group 
de'^elop  a  communication  plan  which 
satisfies  the  communications  re- 
quirements of  the  specific  mission, 
provides  for  COMSEC,  specifies 
alternative  means  of  communication, 
and  insures  operation  of  MIJI  plan? 

2.79 

2.74 

- 

- 

5*    Did  all  elements  understand 
what  they  were  to  do  without  exten- 
sive questioning? 

46% 

6.    Did  the  operation  order 
contain  enough  information  for 
attached  units? 

2.86 

- 

i  

N/A  -  Not  applicable. 


TABLE  5  (Continm.'S 
Deficiencies  in  Intelligeii      d  \d  jiaieratloiis 


Daen 

7;    tSos  sxi££±clent  time  allowed 
to  task  force  elwmt  nts  for  their 
troop  leading  pxaaeedures? 

8,    Overall  y  wrere  the  orders 
appropriate,  clft?rr,  concise,  and 
did  they  contaln.=:ssential  informa- 
tion; wezse  they  ^ssued  so  as  to 
allow  IF  element  maximum  time  to 
go  throa^h  troop  heading  proce- 
dures; mad  were  tizey  coordinated 
with  procer  ageii^icfes?  (3G) 

E.     See  tdae  bdTCS^ield  during 
the  battlte. 

1.    Hot  wil  did  the  coMand 
group  disaae^  nn^^  _r2f  orma  and 
intelligence*^  vas  -^ront-  ri- 

ented^  usab^  t    th«  ^-«ciplf  , 
accurate  >  ani  w^thia  a  c±aee  '  ame 
which  permitted  C*^  --^^r.-^z^^isr  -x) 
react?  (5D) 

Troop  lead  zirrL:    trise  bff*  tt> 
C23A) 

1.    fere  aU  atrtat  co^^-^sz 
UQF^s  adeqix&!£3s^7  ctf^^^to-^-^d/ 
monitored  dux±a%         d^^lUfeict  cf 
the  exercise? 

G.  Coordinate  jacalu.ce  changes. 
(6B) 

1.    Did  the  coomaind:  group 
sometimes  assoe  all  -cnnmanders 
wege  monitoring:  xaadi. :    for  changes? 


5.    Concentr^e/scijfc:  c  Tibat  power. 

1.    How  sell  idi2  tTt^  command 
group  read  ri«e  brntttafield  &  deter- 
miiK  the  precise  plsze  ^  time  for 
maximum  coiabat  poMer  tie^ded  to  be 
employed?  (8A) 


Mechanized 


Non-'Mechanized 


44% 


2.79 


Alrtack 


.4% 


Defense 


Attack 


3.00 


2.94 


56% 


MZL 


56% 


6^ 


2.60 
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TABILC  5  (Continued) 
Def  Iclemnies  In  Intelligence  and  Operatlaits 


2.    "Wheen  the  enemy  canmtttted 
itself,  <Hd  the  command  grnnc:  ade- 
quately 3Hteploy  forces? 

3*    HescB  tactical  declsisms 
nnde  consistent  with  the  r±iae-  - 
distance::  relationship? 

4,  Oterall,  how  well  did  the 
command  scaup  concentrate  ±Z3 
organic/attached /DS  assets  accord- 
ing to  "tttBB  weapons  capabilities  and 
movement,  of  the  enemy  force?  (8B/C) 

5.  Cfcpsrall,  how  well  did  the 
direct  organic/sup- 

fonces  to  conduct  economy 
force  operations  in  the  thinly 
b^d  amas  (wben  concentrating 
cf-nrt)at  pwwex)?  (8D) 

I..    Enamr  Considerations. 

1.  Mas  there  too  much  com- 

2,  IMd  security  violations 
itfviir  daidng  radio  traffic? 


3.  Drerall,  how  well  did  the 
conand  Jiroup  adhere  to  communica- 
tions soM  electronic  security 
measurer^:  (lOA) 

4.  HftB  a  MUX  report  promptly 
submittasd  to  higher  headquarters 
using  gecure  means  of  communica- 
tion? 


Mecbianized 
CFO     ^  Attack 


2.4? 


39% 


2.71^5 


N/A 


Demise 


2.30 


88% 
54% 
2.81 


Attayk  r 


3.M 


"3% 


612 


2  J 


9iZ 


507. 


61% 


4o: 


N/A  ■  Hsft  applicable. 
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TABLE  5  (Continued) 
Def±slencles  In  Intelligence  and  Operations 


Mechanized 

Non-Mechanized 

CFO 

Attack 

Defense 

Attack 

direct  a  sELtch  to  spaK  frequency 
as  a  Last  atesort  using  ^oper 
authetutlcaorlon  technlqxKS? 

23% 

36% 

34% 

43% 

Owrall,  how  well  did  the 
coDiHad  gZEDup  recognize  and  react 

2,82 

2.85 

- 

- 

J.  Other. 

1.    Was  the  sufficient  Intra- 
8ta£f  coordination  between  2/3 
and  1/47 

41/b 

JO/b 

2*    Was  there  sufficient  coor- 
fi  i.nHiaJLon  Derveen  JNUa  azxa  ^  /  o  i 

3.    How  well  did  the  command 
group  apply  the  tiine-d±stance 
relat±onshlp  while  ^naneuverlng 
Task  Force  elements? 

2.86 

4.    Did  the  Tassk  Force  maneu- 
ver elements  become  decisively 
engaged  because  of  battalion  action? 

56% 

N/A 

N/A 

N/A 

5.    What  was  the  size  of  the 
battalion  reserve? 

23%. 

N/A 

N/A 

N/A  -  Not  applicable. 

NCS  ■  Net  Control  Station. 
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15.    Spare  frequencies  were  not  used  correctly. 


J5.    The  battalion  reserve  was  too  large.    The  percentage  for  this 
item  indicates  the  size  of  the  battalion  reserve,  which  should  have  been 
about  10%,  in  the  judgement  of  the  raters. 

The  largest  mnnber  of  deficiencies  occurred  in  the  covering  force 
operation  (33)  and  the  next  largest  in  the  mechairlzed  attack  (22).  TlffiXe 
were  relatively  few  deficient  performances  in  the -defense  (9)  or  in  xhe 
non-mechanized  attack  (6).    The  greatest  concentxad:ions  of  deficiencies 
were  in  categories: 

A.  Intelligence  preparation  of  the  battlefield. 

D.  Communicate/coordinate  plans  and  orders. 

H,  Concentrate/shift  combat  power. 

I.  Enemy  EW  considerations. 

Categories  A,  D  anad  I  were  weak  in  the  covering  force  and  mechanized 
attack;  Category  H,  mainly  in  the  covering  force. 

Fire  Support.    The  deficiencies  in  fire  support  are  labelled  in 
Table  6  as  they  were  on  the  observation  form.    Only  Item  la,  planning 
the  utilization  of  heavy  mortars,  was  deficient  in  several  types  of 
mission.    All  the  other  deficiencies,  which  were  related  to  determining 
the  initial  priority  of  fires  (Subtask  IJ)  and  modifying  the  fire  sup- 
port plan  (Subtask  7A)  were  confined  to  the  covering  force.    Items  4a  and 
4b  were  parts  of  the  overall  performance  encompassed  by  Item  4c. 

Monitor's  Ratings.    The  subtasks  evaluated  by  the  monitor  are 
labelled  in  Table  7  as  they  were  on  his  observation  form  and  in  the 
command  group  ARTEP.    Only  four  subtasks  were  deficient  in  three  of  the 
four  missions:    2C  (Analyze  enemy),  2D  (Disseminate  critical  intelli- 
gence), 3G  (Communicate/coordinate  plans  and  orders),  and  12A  (React  to 
enemy  electronic  warfare).    However,  twelve  subtasks,  including  2C,  2D 
and  3G,  were  deficient  in  both  the  covering  force  and  the  mechanized 
attack;  these  were  primarily  the  subtasks  concerned  with  intelligence 
before  (2A,  2B,  2C  and  2D)  and  during  (5A,  5B,  5C  and  5D)  the  battle, 
and  with  managing  combat  service  support  (9A,  9B  and  9C)  .  Altogether 
19  items  were  deficient  in  the  covering  force  and  15  in  the  mechanized 
attack,  compared  to  just  three  in  the  defense  and  three  in  the  non- 
mechanized  attack. 
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TAStH  6 
Dgrldenc        in  FItr  5 


Non-Mechanized 

Item 

JEEP 

i 

Defense 

Attack 

1.    Plan  use  of  organic/at tacne*d 
and  non-organic  fires.  (11) 

1 

1 
I 

a.    Did  fire  plan  effectively 
utilize  heavy  mortars? 

[ 

1 

40% 

2.    Determine  priority  of  fira^  1J> 

a.    Was  priority  of  fires  s^fron 
to  appropriate  TF  element (s) 
support  scheme  of  maneuver? 

- 

— 

b.    Was  suppression  of  firs 
considered? 

sm 

 ■ 

4.    Modify  fire  support  plan. 

a.    During  the  battle  was 
priority  of  fires  supporting  naer 
scheme  of  maneuver  immediately 
communicated  to  supporting  and 
supported  units? 

b.    Were  requests  for  Imntedi^ 
ate  fire  support  received  and 
assigned  to  appropriate  fire^^i  - 
port  agencies? 

-Ml 

c.    Overall,  how  well  di..  e 
command  group  perform  relati-^^ 
the  standard?  (7A) 

Z.90 
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TABLE  7 

Deficiencies  Observed  by  the  Mocltur 


Sbn-Mechanlzed 

Mecii^ized 

Item 

CFO 

1  Attack 

I>fesfense 

Attack 

1.    Develop  plotl  izased  on  mission. 

B.    I^cntl^r  critical  Intelli- 
gence • 

^  •  i  o 

I.    Plan  usfc  of  organic/ 
attached  and  nccfi~organlc  fires. 

^' 5 

- 

- 

- 

2»    Initiate  In^eLllgence  prepa—  ^ 
ration  of  the  bsttlefleld. 

A.    Identify*  critical  Intelli- 
gence. 

^  55 

2.67 

- 

- 

R       Gather  ^Itlcal  -fntpll-f— 
gence. 

2.83 

— 

C.    Analyze  enemy. 

^ . 

D.    Dlssemr->^!5te  critical 
Intelligence. 

2.  ^3 

2.91 

2  -  f 

3«    Prepare  and.  organize  the 
battlefield. 

£•    Plan  organic,  attached  ssa 
non-organic  supporting  fires  and 
determine  pr±or::ty. 

- 

3.00 

- 

- 

Develop  a  communication 

plan. 

2.71 

G.    ConmremlrTate/ coordinate 
plans  and  orders. 

2.73 

2.73 

2.83 

!•    Employ  active/passive 
security  measures. 

2.60 
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TABLE  7  (Contimied) 
Deficiencies  Observed  by  the  Monitor 


Mechanized 

Non-Mechanized 

Item 

CFO 

Attack 

Defense 

Attack 

5.    See  the  battlefield  during  the 
battle. 

A.    Identify  critical  intelli- 
gence* 

2,75 

2.70 

■ 

B.    Gather  critical  intelli- 
sence. 

2.92 

2.80 

- 

- 

C.    Analyze  enemy. 

2.42 

3.00 

D.    Disseminate  critical 
intelligence • 

3.00 

2.89 

7«    Employ  fires  and  other  combat 
assets • 

A,    Modify  fire  support  plan. 

2.92 

B«    Employ  fires. 

3.00 

- 

- 

- 

8.    Concentrate/shift  combat  power. 

A.    Determine  critical  place 
and  time. 

2.92 

- 

- 

2.83 

B/C.    Concentrate/shift  combat 
power. 

2.83 

9.    Manage  combat  service  support 
assets. 

A.    Arm  and  fuel  the  systems. 

3.00 

3.00 

B.    Fix  the  system. 

3.00 

2.89 

C.    Support  the  troops 

3.00 

3.00 

D.     Integrate  CSS  into  scheme 
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TABLE  7  (Continued) 
Deficiencies  Observe<^  by  ties  Satnitor 


Mechanlz^a 

Non-Mechanized 

Item 

CFO 

Defense 

Attack 

12.    React  to  situations  requiring 
special  actions. 

A.    React  to  enemy  electronic 
warfare. 

3.00 

3.00 

2.00 
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Summary,    The  percentage  of  items  that  were  judged  deficient  in 
each  type  of  mission  is  shown  in  Figure  3.    The  four  percentages  for 
each  mission  refer  to  (1)  administration  and  logistics  rated  by  the 
Sl/SA  controller,  (2)  intelligence  and  operations  rated  by  the  S2/S3 
and  company  conanander  controllers,  (3)  fire  support  rated  by  the  fire 
support  controller,  and  (4)  subtasks  rated  by  the  monitor.  Every 
evaluator  reported  the  greatest  percentage  of  deficiencies  in  the 
covering  force  operation.    The  next  highest  percentages  of  deficiencies 
were  in  the  mechanized  attack,  specifically  in  the  intelligence  and 
operations  items  and  in  the  monitor's  ratings.    There  were  relatively 
few  deficiencies  in  the  defense  and  the  non-mechanized  attack. 

The  preceding  results  indicate  that  the  mechanized  attack  was  per- 
formed better  than  the  covering  force  and  that  both  non^echanized  mis- 
sions were  performed  better  than  the  mechanized  missions.    These  rela- 
tionships were  generally  supported  by  further  analysis  of  the  data. 
Table  8  summarizes  the  mean  scores  on  the  ARTEP  subtasks  evaluated  by 
each  rater  for  each  mission.    A  higher  score  means  the  sub task  was  per- 
formed better.    An  analysis  of  variance  was  done  for  each  set  of  ratings. 
In  Kirk's^  terminology,  it  was  a  split  plot  factorial  design  with  re- 
peated measures  over  two  factors,  mission  (first  versus  second)  and  sub- 
tasks,  with  type  of  unit  (mechanized  versus  non-mechanized)  a  grouping 
factor*  The  ANOVAs  showed  that  the  second  mission  was  significantly 
better  than  the  first  (p<.05)  for  the  administration  and  logistics 
ratings,  and  for  fire  support.    The  non-mechanized  groups  scored  sig- 
nificantly higher  than  the  mechanized  ones  only  on  the  intelligence  and 
operations  subtasks  (p^.OOl).    Planned  comparisons  (t  tests)  showed 
that  the  mechanized  attack  was  performed  better  than  the  covering  force 
for  all  four  sets  of  subtasks:    administration  and  logistics  (p<.01), 
intelligence  and  operations  (p<.05),  fire  support  (p<.001),  and  the 
monitor's  ratings  (p<.05).    The  non-mechanized  attack  was  better  than 
the  defense  only  for  fire  support  (p  <.001).    Because  the  attack  was 
always  the  second  mission,  it  is  not  possible  to  decide  whether  it  was 
an  easier  mission  to  perform  or  whether  the  command  groups  Improved  with 
practice  from  the  first  mission  to  the  second.    It  is  probable,  however, 
that  the  covering  force  was  more  difficult  than  the  defense,  because 
these  uwo  missions  were  always  performed  first. 


CORRELATES  OF  EFFECTIVENESS 

Eight  measures  of  effectiveness  were  obtained  in  this  investigation: 
the  overall  performance  of  the  Si  and  of  the  S4  were  rated  by  the  Sl/S4 

%irk,  R.  E.    Experimental  Design:    Procedures  for  the  behavioral 
sciences.    Belmont,  California:    Brooks  Cole,  1968. 
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15Z      35%     43%  40% 


S1/S4  S2/S3  FS  Monitor 
Covering  Force  Operation 


0%  25% 


0% 


S1/S4  S2/S3    FS  Monitor 
Mechanized  Attack 


9% 

7% 

6% 

4% 

7% 

8% 

6% 

4r 

S1/S4  S2/S3    FS  Monitor 


S1/S4  S2/S3    FS  Monitor 


Figure  3.  Percentage  of  items  rated  deficient  by  each  rater 
for  each  type  of  mission. 
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TA3LE  8 

Mean  Ratings  on  ARTEP  Sub tasks 


Mechanized 

Non-Mechanized 

J.Lclu 

CFO 

Attack 

Defense 

Attack 

Administration  and  logistics 

2.9 

3.7 

3.4 

3,7 

Intelligence  and  operations 

2.9 

3.1  ■ 

3.6 

3.5 

Fire  support 

3.2 

3.9 

3.4 

4.1 

Monitor 

3.1 

3.4 

3.2 

3.3 
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controller,  the  S2  and  S3  were  rated  by  the  S2/S3  controller  and  the 
company  commander  controllers,  the  utilization  of  fire  support  assets 
(FS)  was  rated  by  the  fire  support  controller,  the  battalion  commander 
(Cdr)  and  mission  accomplishment  (Msn)  were  rated  by  the  monitor,  and 
the  overall  effectiveness  of  the  command  group  as  a  whole  (CG)  was  rated 
by  the  S2/S3  and  company  commander  controllers  and  by  the  monitor. 

Intercorrelations  between  Effectiveness  Ratings.    The  intercorrela- 
tions  among  the  measures  of  effectiveness  for  each  mission  are  shown  in 
Tables  9  through  12.    Most  of  the  correlations  in  Tables  9  and  10  are 
based  on  13  cases;  most  of  those  in  Tables  10  and  11  are  based  on  10 
cases.    Correlations  that  are  significant  beyond  the  .01  level  are  marked 
with  asterisks.    The  statistical  significance  of  a  correlation  depends 
on  the  number  of  cases  or.  %^hich  it  is  based,  as  well  as  on  its  size. 
Therefore,  a  given  correlation  may  not  be  significant,  even  though  it 
is  larger  than  a  significant  correlation  that  is  based  on  more  cases. 

As  would  be  expected,  the  rating  of  the  command  group  as  a  whole 
was  the  measure  most  highly  correlated  with  the  other  measures  of 
effectiveness.    Eight  of  the  eleven  significant  correlations  in  Tables 
9  through  12  involved  the  command  group  rating.    The  most  consistently 
related  pair  of  variables  were  the  S3  and  command  group  ratings,  which 
were  significantly  correlated  in  all  four  missions.    On  the  other  hand, 
only  one  of  the  monitor *s  effectiveness  ratings  was  significantly  corre- 
lated with  another  effectiveness  rating,  viz.,  mission  accomplishment 
with  command  group  effectiveness  in  the  defense.    Probably  the  reason 
that  no  other  monitor's  ratings  were  significantly  correlated  was  that 
every  command  group  was  rated  by  a  different  monitor.    Variation  in 
personal  evaluative  criteria  from  one  monitor  to  another  probably  reduced 
the  correlations  between  effectiveness  ratings.    The  FS  rating  was  not^ 
significantly  correlated  with  any  other  measure  of  effectiveness,  wnicn 
probably  reflects  a  lack  of  integration  between  fire  support  and  the 
other  command  group  functions. 

Administration  and  Logistics.    Table  13  shows  that  most  of  the  admin/ 
log  items  significantly  correlated  with  overall  effectiveness  ratings 
were  correlated  with  the  rating  of  the  S4.    This  is  a  plausible  result, 
because  most  of  the  items  refer  to  S4  functions.    There  were  fewer  sig- 
nificant correlations  in  the  non-mechanized  missions  than  in  the  mech- 
anized ones,  because  the  performance  ratings  were  consistently  high  in 
the  non-mechanized  missions.    As  noted  in  the  Method  Section,  when  the 
range  of  the  variables  is  restricted,  the  correlation  between  them  is 
reduced. 
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TABLE  9 

Intercorrelatlons  Between  Effectiveness  Ratings 
for  the  Covering  Force  Operation: 


S2 

S3 

S4 

rs 

Cdr 

Msn 

CG 

SI  .24 

.47 

.87** 

.37 

.41 

.14 

.73* 

S2 

.53 

.28 

.27 

-.61 

-.11 

.38 

S3 

.36 

.37 

-.21 

-.38 

.68* 

S4 

.37 

.40 

.27 

.66* 

A!  U 

_     1  C 

0/. 

on 

Cdr 

.23 

-.11 

Msn  -.31 

*p<;01 
**p<.001 


TABLE  10 

Intercorrelatlons  Between  Effectiveness  Ratings 
for  the  Mechanized  Attack 


S2 

S3 

S4 

FS 

Cdr 

Msn 

CG 

31 

.59 

.63 

.26 

.65 

.21 

.50 

82 

.87** 

.68* 

.24 

.39 

.22 

.77** 

S3 

.58 

.32 

.42 

.57 

.90** 

S4 

.39 

.33 

.10 

.58 

FS 

.02 

.35 

.40 

Cdr 

.14 

.46 

Msn  .66 

*p  < .  01 
**p<.001 
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TABLE  11 

Inter correlations  Between  Effectiveness  Ratings 
for  the  Defense 


S2 

S3 

S4 

FS 

Cdr 

Msn 

CG 

SI  -.51 

-.67 

-.08 

.64 

.40 

-.27 

-.46 

S2 

.64 

-.24 

-.27 

-.42 

.27 

.19 

S3 

-.06 

-.18 

.10 

.47 

.71* 

S4 

.53 

-.04 

-.15 

.07 

FS 

c  c 

1  O 

•  u/ 

Cdr 

.63 

.76 

Msn  .79* 
*p  <.01 


TABLE  12 

Intercorrelatlons  Between  Effectiveness  Ratings 
for  the  Non-Mechanized  Attack 


32 

S3 

S4 

FS 

Cdr 

Msn 

CG 

SI  -.36 

-.14 

.09 

.50 

-.18 

-.17 

.00 

S2 

.38 

-.18 

-.13 

.22 

.63 

.17 

S3 

-.08 

.08 

.32 

.14 

.73* 

S4 

.08 

.54 

.65 

.15 

FS 

.51 

-.04 

.18 

Cdr 

-.15 

.42 

Msn  .52 
*p<.01 
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Inspection  of  the  data  showed  that  most  of  the  aomin/log  subtasks 
or  their  elements  were  significantly  correlated  with  overall  effective- 
ness ratings  in  all  four  missions.    The  subtasks  rated  by  the  S1/S4 
controller  were  3J  (Provide  supplies)  and  3K  (Maintain  equipment) ,  per- 
formed in  preparation  for  the  battle;  and  9A  (Arm  and  fuel  the  systems), 
9B  (Fix  the  systems) ,  9C  (Support  the  troops) ,  and  9D  (Integrate  CSS 
into  scheme  of  maneuver),  performed  during  the  battle.    There  were  only 
a  few  cases  in  which  one  of  these  subtasks  or  its  elemetns  was  not  cor- 
related with  a  measure  of  effectiveness.    Subtash  3K  was  not  correlated 
with  effectiveness  in  the  first  mission  (covering  force  or  defense), 
when  there  was  no  maintenance  to  perform.    Subtasks  9C  and  9D  were  not 
correlated  with  effectiveness  in  the  mechanized  attack,  when  they  were 
performed  cnnKlstrpntly  well,  with  little  variation.    With  the  preceding 
exceptions,  some  elements  of  every  admin/log  subtask  were  significantly 
correlated  with  SI,  S4,  or  command  group  effectiveness  in  all  four  mis- 
sions. 

Intelligence  and  Operations.    About  20%  of  the  items  in  this  cate- 
cory  were  significantly  correlated  with  the  effectiveness  rating  for  the 
command  group  as  a  whole  (Table  14) .    Somewhat  fewer  items  were  corre- 
lated with  the  S2  and  S3  ratings.    There  were  more  significant  correla- 
tions in  the  two  mechanized  missions,  because  there  was  a  wider  range 
of  variation  in  performance,  as  noted  above. 

Most  of  the  intel/ops  items  were  related  to  ARTEP  subtasks  and  ele- 
ments thereof.    The  following  subtasks  or  their  elements  were  consistently 
correlated  with  effectiveness  ratings  in  three  or  in  all  of  the  four 
different  missions: 


2B.  Gather  critical  combat  information  and  intelligence • 

2C.  Analyze  opposing  force. 

2D.  Disseminate  critical  combat  information  and  intelligence. 

3C.  Organize  for  combat. 

3G.  Communicate/coordinate  plans  and  orders. 

(The  preceding  subtasks  were  performed  in  preparation  for  the 
battle;  the  following  were  performed  during  the  battle.) 


•y  critical  combat  infcrniaticn  and  intelligence  = 
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TABLE  13 

Number  of  Administration  and  Logistics  Itmaa 
Slgtttflcantly  Correlated  (p  <.01)  With  Effectiveness  Ratings 

Effectiveness       Mechanized  Non-Meclianized 


Hating 

CFO 

Attack 

Defense 

Attack 

SI 

14 

2 

0 

0 

S4 

16 

9 

12 

4 

CG 

5 

3 

0 

0 

NOTE:  The  total  number  of  Items  was  26  In  the  CFO  and  defense. 
28  In  the  attack. 


TABLE  14 

Number  of  Intelligence  and  Operations  Items 
Significantly  Correlated  (p  <.0l)  With  Effectiveness  Ratings 

Effectiveness       Mechanized  Non-Mechanized 


Rating 

CFO 

Attack 

Defense 

Attack 

S2 

9 

9 

8 

2 

S3 

3 

18 

8 

8 

CG 

21 

17 

11 

19 

NOTE:    The  total  number  of  Items  was  93  in  the  covering  force, 
89  In  the  mechanized  attack,  95  In  the  defense  and  88  In  the 
non-mechanized  attack. 
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5B.    Gather  critical  combat  information  and  intelligence. 
6B.    Coordinate/ communicate  changes. 
8A.    Determine  critical  place  and  time. 

8B/C.    Concentrate/shift  combat  power.     (8B,  in  the  attack/8C,  in 
the  defense  or  retrograde) . 

lOA.    Defeat  or  suppress  the  enemy's  electromagnetic  intelligence 
effort. 

ILA.    Troop  lead  during  battle. 

Another  item,  not  explicitly  part  of  any  subtask,  that  was  signifi- 
cantly correlated  with  effectiveness  ratings  in  all  four  missions,  was 
the  question:    "How  well  did  the  command  group  apply  the  time-distance 
relationship  while  maneuvering  Task  Force  elements?" 

Subtasks  IB  and  2A,  which  are  separate  items  in  the  ARTEP,  could 
not  be  evaluated  separately  by  the  controllers. 

Fire  Support.    One  third  of  the  items  rated  by  the  fire  support  con- 
troller were  significantly  correlated  \rLth  his  rating  of  overall  fire 
support  effectiveness  (Table  15).    However,  none  of  the  items  were  cor- 
related with  the  rating  of  command  group  effectiveness,  which  is  con- 
sistent with  the  low  correlation  between  FS  and  CG  ratings  mentioned 
earlier.    Examination  of  the  results  showed  that  all  four  subtasks  rated 
by  the  fire  support  controller  were  significantly  correlated  with  his 
rating  of  fire  support  effectiveness  in  three  of  the  four  missions.  In 
the  non-mechanized  attack,  when  the  subtasks  were  performed  consistently 
well,  Subtask  IL  was  the  only  one  significantly  coirelateu  with  fire 
suppoift  ef f ecLiveiiSss . 

The  four  subtasks  rated  by  the  fire  support  controller  were: 

II.    Plan  use  of  organic/attached  and  non-organic  fires. 

IJ.    Determine  priority  of  fires. 

IL.    Conduct  initial  fire  support  coordination. 

7A.    Modify  fire  support  plan. 
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TABLE  15 
Number  of  Fire  Support  Items 
Significantly  Correlated  (p^.Ol)  With  Effectiveness  Ratings 


Effectiveness       Mechanized  Non-Mechanized 
Rating  CFO       Attack  .Defense  Attack 

FS  5  6  6  1 

CG  0  0  0  0 

NOTE:  The  total  number  of  items  was  14  in  the  covering  force 
and  defense,  13  in  the  attack. 


TABLE  16 

Number  of  Items  Rated  by  the  Monitor 
Significantly  Correlated  (p<.01)  With  Effectiveness  Ratings 

Effectiveness       Mechanized  Non-Mechanized 


Rating 

CFO 

Attack 

Defense 

Attack 

Msn 

0 

0 

3 

0 

Cdr 

2 

0 

3 

1 

CG 

0 

0 

/\ 

u 

NOTE:    The  total  number  of  items  in  each  mission  was  47. 
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Monitor.    Few  of  the  Items  raced  by  the  monitor  were  correlated 
with  any  measure  of  overall  effectiveness  (Table  16) ,  probably  because 
each  command  group  was  observed  ivy  a  different  monitor. 

Summary.    Figure  4  presents  tae  percentage  of  items  on  each  observa- 
tion form  that  were  significantly  correlated  with  one  or  more  measures 
of  effectiveness.     In  the  two  mechanized  missions  and  in  the  defense, 
many  of  the  S1/S4  items  (39%  to  65%)  and  the  fire  support  items  (43% 
to  46%  were  correlated  with  effectiveness.    Few  of  them  (14%  and  8%) 
were  related  to  effectiveness  in  the  non-mechanized  attack,  when  they 
were  generally  performed  well.    About  one-fourth  of  the  S2/S3  items 
(22%  to  29%)  were  strongly  related  to  effectiveness  in  all  four  missions. 
Few  of  the  monitor's  ratings  (0%  to  13%)  were  correlated  with  effective- 
ness in  any  mission. 

Comparison  with  Figure  3  shows  that  many  more  items  were  correlated 
with  effectiveness  than  were  deficient.    This  was  especially  true  in  the 
defense,  where  a  substantial  percentage  of  the  items  were  significantly 
related  to  effectiveness,  but  few  of  them  were  deficient. 


CRITICAL  PERFOBMANCES 

In  terms  of  the  criteria  eipployed  in  this  investigation,  the  most 
important  command  group  perforaaiBnces  were  those  that  were  both  deficient 
and  significantly  correlated  wrnSi  overall  effectiveness.    Twenty-four  of 
the  performances  evaluated  in  the  areas  of  administration  and  logistics, 
intelligence  and  operations,  and  fire  support  were  identified  as  critical 
in  at  least  one  mission.    These  performances  and  the  missions  in  which 
they  were  critical  are  listed  in  Tables  17  through  19,  below. 

Administration  and  LuKiatics.    Four  items  evaluated  by  the  S1/S4 
controller  were  both  deficient  and  correlated  with  effectiveness.  Three 
of  them  were  critical  in  the  covering  force  operation  and  one  in  the 
defense,  as  shown  in  Table  17.    They  are  labelled  in  the  table  as  they 
were  on  the  controller's  observation  form.    The  first  three  items  were 
elements  of  Subtask  9A,  Arm  and  fuel  the  systems.    The  fourth  item  was 
Subtask  9D,  Integrate  CSS  into  the  scheme  of  maneuver. 

Intelligence  and  Operations.    The  17  items  that  were  identified  as 
critical  in  this  area  are  listed  in  Table  18.    They  are  numbered  con- 
secutively in  each  category.    They  were  renumbered  in  the  table,  because 
they  did  not  have  the  same  numbers  on  the  observation  forms  for  each 
mission.    Five  of  the  items  corresponded  to  ARTEP  subtasks.    These  were 
items  A2,  A4,  A7,  Dl,  and  13  which  referred  to  the  following  sub  tasks: 

IB,  2A.     Identify  critical  combat  information  and  intelligence. 
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Figure  4.  Percentage  of  Items  significantly  correlated  (p<.01) 
with  effectiveness  for  each  rater  and  type  of  mission. 
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TABLE  17 
Critical  Performances 
In  Administration  and  Logistics 

It^m  Mission 

6.    In  providing  supplies  and  equipment  to  arm  and  fuel 
the  system  during  the  battle  (9A) ; 

d.  Did  the  S4  coordinate  with  the  S2  so  he  knew  the  CFO 
enemy's  capabilities? 

e.  Did  the  S4  keep  his  higher  appropriately  CFO 

g.  Did  the  S4  effectively  utilize  his  direct  support  Defense 
assets? 

9.    c.    How  effectively  was  CSS  Integrated  Into  the  CFO 
scheme  of  maneuver?  (9D) 


1303 

J 

O 

ERIC 


TABLE  18 
Critical  Performances 
in  Intelligence  and  Operations 

Item  Mission 
Intelligence  preparation  of  the  battlefield. 

!•    Was  the  enemy's  scheme  of  maneuver  and  fire  CFO 
support  Identified? 

2.    Overall,  how  well  did  the  command  group  identify  CFO 
critical  combat  information  and  intelligence?  (IB,  2A) 


3.  Was  the  XF  intelligence  collee&iou  plan  properly 
prepared y  and  did  it  reflect  analysis  by  the  bat- 
talion SZ~  of  tasking  responsibilities? 

4.  Overall,  how  well  did  the  command  group  deter- 
mine combat  information  and  intelligence  shortfalls 
and  aggressively  gather  information  from  all  avail- 
able/appropriate sources?  (2B) 

5«  Was  relevant  information  from  higher  headquarters 
and  adjacent  units  disseminated  to  company  commanders 
(e.g.,  minefields)? 

6.  Were  company  commanders  given  an  estimate  of 
specifically  what  they  would  be  facing? 

7.  Overall,  did  the  command  group  disseminate  combat 
information  and  intelligence  that  was  event-oriented 
and  usable  to  the  recipient?  (2D) 

Communicate/coordinate  plans  and  orders. 

1.    Overall,  were  the  orders  appropriate,  clear, 
concise,  and  did  they  contain  essential  information; 
were  they  issued  so  as  to  allow  TF  elements  maximum 
time  to  go  through  troop  leading  procedures:  and  were 
they  coordinated  with  proper  agencies?  (3G) 

Troop  lead  during  the  battle. 

1.    Were  all  attached  combat  units  adequately  con-  CFO 
trolled/monitored  during  the  conduct  of  the  exercise? 
(IIA) 


nvn. 
— —  " » 

Non-mech  atk 


CFO, 

Mech  atk 


Mech  atk 


CFO 


CFO 


CFO, 

Mech  atk 
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TABLE  18  (Continued) 
Critical  Performances 
in  Intelligence  and  Operations 

Item 

Coordinate /communicate  changes* 

1.  Did  the  command  group  sometimes  assume  all  com- 
manders were  monitoring  radios  for  changes?  (6B) 


H.    Concentrate/shift  combat  power. 


!•    Were  tactical  decisions  uunie  uuusiatcsut  with 
the  time-distance  relationship?  (8C) 

Enemy  EW  considerations. 

1.  Was  there  too  much  communication? 

2.  Did  security  violations  occur  during  radio 
traffic? 

3.  Overall,  how  well  did  the  command  group  adhere 
to  communications  and  electronic  security  measures? 
(lOA) 

4.  Did  the  command  group  direct  a  switch  to  spare 
frequency  as  a  last  resort,  using  proper  authentica- 
tion techniques?  (12A) 

Other. 

1.  Was  there  sufficient  intra-staff  coordination 
between  2/3  and  1/4? 

2.  How  well  did  the  command  group  apply  the  time- 
distance  relationship  while  maneuvering  Task  Force 
elements? 


Mission 


Mech  atk 


Mech  atk 
CFO, 

Mech  atk 
CFO, 

Mech  atk 


CFO 


Mech  atk 


CFO 
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2B.    Gather  critical  combat  Information  and  Intelligence. 

2D.    Disseminate  critical  combat  Information  and  intelligence. 

3G.    Communicate/coordinate  plans  and  orders. 

lOA.    Defeat  or  suppress  opposing  force's  electromagnetic  intelli- 
gence effort. 

The  subtasks  to  which  the  above  items  referred  are  indicated  in 
parenthesis  after  the  items  in  the  table.     Six  other  items,  Al,  A3,  A5, 
A6,  II  and  12,  were  elements  of  these  subtasks.    Four  more  items,  Fl, 
Gl,  HI  and  I4  were  elements  of  the  following  subtasks: 

llA.    Troop  lead  during  battle. 

6B.     Coordinate/communicate  changes. 

8C.    Concentrate/shift  combat  power  in  the  defense  or  retrograde. 
12A.    React  to  opposing  force  electronic  warfare. 

Finally,  items  Jl  and  J2  were  not  classified  as  part  of  any  specific 
subtask.    Four  of  the  items  in  Table  18  were  critical  in  both  the  cover- 
ing force  and  the  mechanized  attack,  one  in  the  covering  force  and  non- 
mechanized  attack,  eight  in  the  covering  force  alone,  and  four  only  in 
the  mechanized  attack.    None  were  critical  in  the  defense. 

Fire  Support.    The  three  items  in  Table  19  are  labelled  as  they  were 
on  the  fire  support  observation  forms.    Item  la  was  part  of  Subtask  II, 
Plan  use  of  organic/attached  and  non-organic  fires.    Item  4b  was  part 
of  Subtask  7A,  Modify  fire  support  plan,  and  Item  4c  referred  to  the 
entire  Subtask  7A.    All  three  performances  were  critical  In  the  covering 
force. 

Monitor.    Two  items  rated  by  the  monitor  met  the  joint  criteria  of 
deficiency  and  correlation  with  effectiveness:    Subtask  8A,  Determine 
critical  place  and  time,  and  Subtask  8C,  Concentrate/shift  combat  power 
in  the  defense  or  retrograde,  were  both  identified  as  critical  in  the 
covering  force  operation. 

Summary.    It  is  apparent  in  Figure  5  that  the  majority  of  critical 
performances  were  identified  in  the  covering  force  operation,  where 
12%  to  21%  of  the  S1/S4,  S2/S3  and  fire  support  items  were  both  defi- 
cient and  significantly  correlated  with  effectiveness.    A  secondary 
concentration  of  critical  performances  (10%) occurred  in  the  S2/S3 
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TABLE  19 
Critical  Performances 
in  Fire  Support 

Item  Mission 

1-    Plan  use  of  organic/attached  and  non-organic  fires 
(II) 

a.  Did  fire  plan  effectively  utilize  heavy  mortars?  CFO 
4.    Modify  fire  support  plan.  (7A) 

b.  Were  requests  for  immediate  fire  support  received  CFO 
and  assigned  to  appropriate  fire  support  agencies? 

c.  Overall,  how  well  did  the  command  group  perform  CFO 
relative  to  the  standard? 
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21% 

12% 

13% 

4% 

S1/S4  S2/S3  FS  Monitor 
Covering  Force  Operation 


0% 


0% 


0% 


S1/S4  S2/S3    FS  Monitor 
Mechanized  Attack 


4% 


0% 


0% 


0% 


S1/S4  S2/S3    FS  Monitor 
Defense 


0% 


1% 


0% 


0% 


S1/S4  S2/S3    FS  Monitor 
Non-Mechanized  Attack 


Figure  5.  Percentage  of  Items  Identified  as  critical  by  each 
rater  for  each  type  of  mission. 
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area  in  the  mechanized  attack.  Very  few  items  were  critical  in  the 
defense  or  the  non-mechanized  attack. 


RATER  RELIABILITY 

Comparing  the  performance  ratings  from  several  different  observers 
of  the  same  command  group,  suggested  three  related  questions: 

1.  How  reliable  were  the  ratings  of  a  single  observer? 

2.  How  reliable  were  the  mean  ratings  from  several  observers? 

3.  fiow  significant  were  the  differences  among  raters? 

Since  the  intelligence  and  operations  items  were  scored  by  four  or  five 
raters  at  every  exercise,  it  was  possible  to  measure  the  amount  of  con- 
sistency or  disagreement  among  different  raters.    Rater  reliability 
measures  the  consistency  within  one  or  more  raters;  conversely,  analysis 
of  variance  tests  the  significance  of  differences  among  raters. 

Reliability  is  defined  as  the  ratio  of  true  score  variance  to  total 
score  variance 5  where  "true  score"  means  that  part  of  the  score  that  is 
the  same  at  each  rescoring.  Two  measures  of  rater  reliability^  are  the 
reliability  of  ratings  from  a  single  rater,  r^i,  and  the  reliability  of 

Vi  -  Ve  ^  Vi  -  Ve 

mean  ratings  from  k  raters,  Ty^y^:    th  =  yi  +  (k-1)  Ve         ^kk  *  vl 

where  Vi  =  variance  for  items 
Ve  =  variance  for  error 
k  »  number  of  raters 

The  coefficients  of  rater  reliability  for  eight  randomly  selected 
missions  are  listed  in  Table  20  in  the  order  that  the  missions  occurred. 
Each  coefficient  was  calculated  from  the  intelligence  and  operations 
items  rated  on  a  five-point  scale  by  four  or  five  observers.    The  relia- 
bility of  ratings  from  a  single  rater  varied  from  .07  to  .38  with  a  mean 
of  ,22.    The  reliability  of  mean  ratings  from  several  observers  varied 
from  .29  to  .71  with  a  mean  of  .55.    Thus,  increasing  the  number  of 
raters  from  one  to  four  or  five  more  than  doubled  the  rater  reliability. 

The  variance  analyses  on  which  the  reliability  estimates  were  based 
showed  that  in  every  case  the  differences  among  raters  were  statistically 

^Guilford,  J.  P.    Psychometric  Methods,  2d  Ed.    New  York:  McGraw 
Hill,  1954. 
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TABLE  20 
Rater  Reliability 
for  Eight  Selected  Missions 

Coefficient  of  Reliability 


Number  Single  Mean  of 

Raters,  k  Rater,  r^^^^  k  Raters,  ry^ 

4  .24  .56 

5  .21  .57 

4  .21  .52 

5  .26  .64 

4  .38  .71 

5  .24  .61 
5*  .18  .53 
5  .07  .29 
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significant  beyond  the  .001  level.    In  a  typical  case,  with  T^^i  *  .21 
and  T^^  »  .57,  the  mean  ratings  from  five  different  observers,  averaged 
over    all  the  five-point  intelligence  and  operations  items,  were  2.6, 
3.5,  3.8,  3.9  and  4.4.    Significant  differences  among  the  means  of  dif- 
ferent raters  for  the  same  command  group  is  evidence  of  rater  bias,  i.e., 
the  tendency  of  an  individual  to  give  high,  or  low,  ratings.    One  impli- 
cation of  this  result  is  that  the  same  rater  or,  preferably,  raters 
should  be  employed  when  comparing  the  performances  of  different  command 
groups  or  of  the  same  command  group  at  different  times. 


DISCUSSION 


CRITICAL  SUBTASKS 

The  present  investigation  brings  to  50  the  total  number  of  battalion 
command  groups  whose  performance  in  CATTS  exercises  has  been  analyzed. 
Twenty-three  groups  were  observed  in  the  present  study  and  27  in  the 
previous  one.^^    Both  investigations  used  essentially  the  same  criteria 
of  low  performance  rating  and  high  correlation  with  effectiveness  to 
identify  critical  performances,  although  there  were  some  differences  in 
detail.    One  difference  was  that  in  the  present  investigation  a  subtask 
was  considered  critical  when  any  one  of  its  elements  was  critical; 
whereas  in  the  previous  investigation  subtask  elements  were  not  rated. 
Another  difference  was  in  the  cut-off  point  for  identifying  low-rated 
performances.     In  the  present  investigation,  a  performance  was  classified 
as  deficient  when  its  mean  rating  was  at  or  below  the  midpoint  of  the 
rating  scale.     In  the  previous  investigation,  a  performance  was  considered 
deficient  when  its  mean  rating  was  one  standard  deviation  below  the  mean 
of  all  the  subtasks  rated  by  the  observer  who  evaluated  it. 

The  reason  a  relative  criterion  was  used  in  the  previous  investiga- 
tion, rather  than  the  midpoint  of  the  scale,  was  that  very  few  subtasks 
were  rated  below  the  middle  of  the  three-point  scale  used  in  that  study. 
The  three  points  were  definci^d  as: 

1  -  Unsatisfactory,  major  departure  from  ARTEP  standard 

2  -  Minor  deviation  from  standard 

3  -  Satisfies  the  standard, 


'Barber,  H  F.,  and  Kaplan,  I.  T.    op.  cit. 
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and  the  lowest  rating  was  rarely  given.    In  contrast,  the  five-point 
scale  used  in  the  present  investigation  increased  the  range  of  scores 
given  for  sub task  performance,  so  that  for  every  rater  there  were  several 
subtasks  whose  mean  ratings  were  below  the  midpoint  of  the  scale. 

In  spite  of  some  differences  in  procedure  between  the  two  investiga- 
tions, there  was  considerable  agreement  between  them  with  respect  to 
the  subtasks  that  were  identified  as  critical.    Table  21  summarizes  all 
the  subtasks  that  were  critical  in  either  study  -  a  total  of  20  differ- 
ent subtasks.     The  X's  in  the  two  right-hand  columns  indicate  the  study 
in  which  each  subtask  was  critical.     It  can  be  seen  that  14  subtasks 
were  identified  in  the  previous  investigation  and  15  in  the  present  one. 
Nine  subtasks  were  critical  in  both  studies:    Subtasks  IB,  II,  2A,  2B, 
3G,  8A,  8C,  lOA  and  12A. 

Furthermore,  all  the  subtasks  that  were  critical  in  only  one  study 
met  one  criterion  of  criticality  in  the  other.    Thus,  five  subtasks 
(2C,  5B,  5C,  5D  and  8B)  were  critical  in  the  previous  investigation, 
but  not  in  the  present  one.    Four  of  them  (2C,  5B,  5C  and  5D)  were 
deficient  in  the  present  investigation;  however,  they  were  not  corre- 
lated with  effectiveness.    The  fifth  (SB)  was  correlated  with  effective- 
ness, but  was  not  deficient.    Conversely,  six  subtasks  (2D,  6B,  7A,  9A, 
9D  and  llA)  that  were  critical  in  the  present  investigation  were  not 
critical  in  the  previous  one.    Subtask  2D  was  deficient  in  the  previous 
study,  but  it  was  not  then  correlated  with  effectiveness.    The  other 
five  subtasks  (6B,  7A,  9A,  9D  and  llA)  were  correlated  with  effective- 
ness, but  were  not  deficient.    Overall,  the  two  investigations  of  bat- 
talion command  group  performance  were  in  general  agreement  concerning 
the  identification  of  critical  ARTEP  subtasks. 

The  four  missions,  however,  differed  greatly  with  respect  to  subtask 
criticality.    All  but  one  (6B)  of  the  15  subtasks  that  were  critical  in 
this  investigation  were  critical  in  the  covering  force  operation.  Five 
subtasks  (2B,  2D,  3G,  6B  and  lOA)  were  critical  in  the  mechanized  attack. 
Only  one  (9A)  was  critical  in  the  defense,  and  one  (2B)  in  the  non- 
mechanized  attack.    The  distribution  of  critical  subtasks  paralleled 
the  distribution  of  performance  deficiencies,  i.e.,  the  greatest  number 
of  deficiencies  occurred  in  the  covering  force  mission  followed  by  the 
mechanized  attack,  defense,  and  non-mechanized  attack. 

Performance  differences  between  the  first  and  second  missions  may 
have  been  caused  by  improvement  with  practice  or  by  differences  in 
difficulty.    The  contributions  of  practice  and  difficulty  were  con- 
founded, because  the  second  mission  was  always  an  attack.    The  results 
do  indicate,  however,  that  the  mechanized  missions  were  more  difficult 
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TABLE  21 
Sub tasks  Identified  as  Critical 
for  Battalion  Command  Group  Training 


TASK  1.    Develop  plan  based  on  mission. 

IB.    Identify  critical  combat  Information  and 

intelligence. 
11.    Plan  fires. 
TASK  2.    Initiate  Intelligence  preparation  of  the 
battlefield. 
2A.    Identify  critical  combat  Information  and 
intelligence. 

2B.    Gather  critical  combat  Information  and  Intelli- 
gence. 

2C.    Analyze  opposing  force. 

20.    Disseminate  critical  combat  Information  and 
Intelligence. 
TASK  3.    Prepare  and  organize  the  battlefield. 

3G.    Communicate/coordinate  plans  and  orders. 
TASK  3.    See  the  battlefield  during  the  battle. 

5B.    Gather  critical  combat  Information  and  Intelli- 
gence. 

5C.    Analyze  opposing  force. 

5D.    Disseminate  critical  combat  Information  and 
Intelligence. 
TASK  6.    Control  and  coordinate  combat  operations. 

6B.    Coordinate/communicate  changes. 
TASK  7,    Employ  fires  and  other  combat  support  assets. 

7A.    Modify  fire  support  plan. 
TASK  8.    Concentrate/ shift  combat  power. 
8A.    Determine  critical  place  and  time. 
8B.    Concentrate/ shift  combat  power  In  the  attack. 
8C.    Concentrate/shift  combat  power  In  the  defense 
or  retrograde. 
TASK  9.    Manage  combat  service  support  assets. 
9A.    Arm  and  fuel  the  systems. 
9D.    Integrate  CSS  Into  scheme  of  maneuver 
TASK  10.  Secure  and  protect  the  TF. 

lOA.    Defeat  or  suppress  opposing  force's  electro-; 
magnetic  Intelligence  effort. 
TASK  11.  Troop  lead  during  battle. 

IIA.    Supervlce  compliance  with  TE  order. 
TASK  12.  React  to  situations  requiring  special  actions. 
12A.    React  to  opposing  force  electronic  warfare. 


Previous 
Study 

X 
X 

X 
X 
X 


X 

X 

X 
X 


X 
X 
X 


Present 
Study 


X 
X 


X 
X 


X 
X 
X 
X 


X 
X 


X 
X 
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than  the  non-mechanized  ones.  In  the  previous  investigation,  there 
were  too  few  non-mechanized  units  to  permit  a  similar  comparison  of 
mission  performance. 

RATER  RELIABILITY 

The  utility  of  the  command  group  ARTEP  as  a  means  of  diagnosing 
training  deficiencies  and  evaluating  command  group  effectiveness  is 
limited  by  its  reliability  as  a  measuring  instrument.    Even  under  the 
ideal  conditions  of  this  investigation  (i^e.,  experienced  controller- 
evaluatorsy  a  realistic,  automated  battle  simulation,  and  a  standard- 
ized scenario)  rater  reliability  was  low.    Low  reliability  can  be 
tolerated  in  research >  when  ratings  from  many  units  can  be  averaged 
so  that  rating  errors  tend  to  cancel  out.    However,  low  reliability 
is  a  problem  when  performance  ratings  are  used  to  diagnose  and  evaluate 
individual  command  groups. 

Two  steps  can  be  taken  in  the  present  time  frame  to  enhance  the 
reliability  of  command  group  performance  ratings:    one  is  to  average 
the  ratings  from  several  different  observers;  the  other  is  to  use  the 
same  raters  when  comparing  different  command  groups  or  when  evaluating 
the  performance  of  a  given  group  at  different  times.    Over  the  longer 
term,  however,  the  improvement  of  rater  reliability  will  require  con- 
tinued research  to  develop  more  objective  measures  of  command  group 
performance. 

SUMMARY  AND  CONCLUSIONS 

1,    Fifteen  subtasks  of  the  command  group/staff  module  of  ARTEP  71-2 
were  identified  as  critical  by  virtue  of  being  both  low-rated  and  highly 
correlated  with  effectiveness.    These  subtasks  can  be  summarized  briefly 
within  five  functional  areas: 

a.  Fire  support:    Develop  (II)  and  modify  (7A)  the  fire  support 
plan. 

b.  Intelligence  preparation  of  the  battlefield:    Identify  (IB,  2A) , 
gather  (2B)  and  disseminate  (2D)  critical  combat  information  and  intel- 
ligence. 

c.  Operations:    Communicate/coordinate  plans  and  orders  (3G)  and 
changes  (6B) ,  and  supervise  compliance  with  the  task  force  order  (llA) . 
Determine  the  critical  place  and  time  (8A) ,  and  concentrate/shift  combat 
power  (8C) . 
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d.  Logistics:    Arm  and  fuel  the  systems  (9A),  and  Integrate  combat 
service  support  Into  the  scheme  of  maneuver  (9D) . 

e.  Electronic  warfare:    Combat  enemy  electromagnetic  Intelligence 
(lOA)  and  electronic  warfare  (12A). 

These  critical  performances  should  be  given  particular  attention  In  the 
development  and  evaluation  of  command  group  training  programs  and  simu- 
lations. 

2.  The  four  missions  observed  in  this  investigation  were  markedly  dif- 
ferent with  respect  to  subtask  crltlcallty.    All  but  one  (6B)  of  the 
subtasks  listed  above  were  critical  in  the  covering  force  operation, 
five  (2B,  2D,  3G,  6B  and  lOA)  were  critical  in  the  mechanized  attack, 
one  (9A)  in  the  defense,  and  one  (2B)  in  the  non-mechanized  attack.  The 
effects  of  practice  and  difficulty  were  confounded  in  comparing  the 
attack  missions  to  the  covering  force  or  defense,  because  the  attack  was 
always  the  second  mission.    It  can  be  Inferred,  however,  that  the  cover- 
ing force  was  more  difficult  than  the  defense,  since  both  those  missions 
were  performed  first  -  by  mechanized  and  non-mechanized  groups,  respec- 
tively. 

3.  Rater  reliability,  which  measures  the  internal  consistency  of  a 
rater  or  group  of  raters,  was  low.    The  coefficient  of  reliability  for 
subtask  performance  scores  from  a  single  rater  was  only  .22.  It 
increased  to  .55  when  the  scores  from  four  or  five  raters  were  averaged. 

4.  Individual  raters  differed  in  their  judgement  of  subtask  performance. 
The  differences  among  ratings  of  the  same  command  group  by  different 
observers  were  significant  beyond  the  .001  level. 

5.  The  effects  of  mission  type,  rater  reliability,  and  individual 
differences  among  raters  have  implications  for  the  measurement  of  com- 
mand group  performance.    These  effects  should  be  controlled  when  diag- 
nosing training  requirements,  comparing  command  groups,  or  evaluating 
training  systems.    Specifically,  the  same  type  of  mission  and  the  same 
raters  (several  raters)  should  be  used  when  comparing  the  performance 
of  different  command  groups  or  of  the  same  command  group  at  different 
times.    In  addition,  the  low  rater  reliability  and  significant  differ- 
ences among  raters  indicate  the  desirability  of  further  research  to 
develop  more  objective  measures  of  command  group  performance. 
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The  need  for  methods  of  measuring  team  and  unit  proficiency,  and 
the  lack  of  knowledge  in  this  area,  are  widely  recognized.    Team  perfor- 
mance measurement  difficulties  have  been  noted  as  fundamental  problems 
In  unit  proficiency  diagnosis  and  training  evaluation,  in  both  military 
and  civilian  settings  (Blum  and  Naylor,  1968;  Defense  Science  Board, 
1975) •    Existing  combat  unit  performance  measurement  techniques  depend 
largely  on  judgmental  data,  and  often  do  not  evaluate  the  unit's  abil- 
ity In  the  field  (Hayes  et  al,  1977).    Researchers  must  solve  these 
measurement  problems  before  they  can  substantially  Improve  unit  training. 

A  tactical  training  system,  called  engagement  simulation  (ES),  that 
includes  objective,  accurate  casualty  assessment,  offers  a  potential 
solution  for  team  performance  measurement  in  combat  training.  Objective 
casualty  assessment  provides  the  primary  measures  of  team  proficiency, 
such  as  casualty  exchange  ratios  and  mission  accomplishment.  Recent 
advances  in  ES  procedures  have  further  Improved  its  use  for  assessing 
tactical  performance.    This  paper  reviews  application  of  ES  to  unit 
measurement,  with  emphasis  on  lessons  learned  while  validating  ES  proce- 
dures for  armor  units,  and  while  developing  ES  for  armored  cavalry  units. 

ENGAGEMENT  SIMULATION 

ES  techniques  provide  realistic  tactical  training  under  conditions 
that  simulate  the  complex  modem  battlefield.    In  addition  to  the  casu- 
alty assessment,  characteristics  of  ES  that  contribute  to  the  realism  are 
that  it  uses  two-sided,  free-play  tactical  field  exercises,  and  it  simu- 
lates weapons  effects  and  signatures. 

Objective  casualty  assessment  is  achieved  when  a  soldier,  looking 
through  a  6-power  telescope  mounted  on  his  rifle,  correctly  reads  a 
3-inch,  two-digit  number  on  the  helmet  of  an  opposing  unit  member.  The 
telescope  power  and  helmet  number  size  are  calibrated  to  produce  hit/kill 


The  views,  opinions,  and  findings  contained  in  this  report  are  those 
of  the  author  and  should  not  be  construed  as  an  official  Department  of 
the  Amy  position,  policy,  or  decision,  unless  so  designated  by  other 
official  documentation. 
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probabilities  realistic  for  the  weapon's  lethality.    When  tha  soldier 
fires  a  blank  round  and  correctly  identifies  the  opposing  helmet  number, 
a  casualty  is  assessed.    A  controller  with  the  fire  team  radios  the 
helmet  number  to  the  controller  with  the  opposing  team,  who  informs  the 
"hit"  soldier  (US  Army  Infantry  School,  1975). 

Analogous  objective  casualty  assessment,  weapons  effects,  and  sig- 
nature simulation  procedures  have  been  established  for  Infantry,  armor, 
and  antlarmor  elements.  Including  these  weapons  systems:    M60  tank  main 
gun,  mines,  hand  grenades,  machine  guns,  and  light  (LAW),  medium  (DRAGON), 
and  heavy  (TOW)  antitank  weapons.    For  weapons  with  longer  ranges  than 
that  of  the  rifle,  the  controller  is  equipped  with  optics  to  sight  Indi- 
vidual helmet  numbers  and  numbers  on  panels  attached  to  vehicles,  m 
the  tank,  for  example,  the  controller's  telescope  is  mounted  in  the 
breech  of  the  main  gun.    When  the  controller  in  the  tank  determines  that 
the  gun  is  centered  on  a  target  at  the  time  of  simulated  round  impact, 
he  assesses  a  casualty.    The  controller  then  radios  the  number  of  the 
"hit"  in  the  same  way  as  described  for  the  rifle  casualty  assessment. 

The  radio  net  over  which  the  controllers  announce  the  casualties  is 
used  by  senior  controllers  to  administer  the  exercise,  and  is  monitored 
by  personnel  who  record  the  "hits."    They  write  the  time,  target  number, 
and  flrer  number  on  a  net  control  sheet,  and  they  check  that  the  "hit" 
was  confirmed  by  the  controller  in  the  target  vehicle. 

A1 1  ES  systems  provide  some  way  of  identifying  casualties.    In  the 
REALTRAIN  system,  telescopes  and  numbers  are  employed,  and  have  been 
used  for  training  with  opposing  forces  as  large  as  reinforced  platoons. 
In  order  to  achieve  tactical  realism  in  larger  units,  a  Multiple  Inte- 
grated Laser  Engagement  System  (MILES)  has  been  developed.    MILES  employs 
low-power,  eye-safe  laser  transmitters  mounted  on  each  weapon.  Each 
target  (vehicle  or  soldier)  has  solar  cell  detectors  which  receive  the 
laser  signal  as  either  a  hit  or  a  near  miss.    Hits  activate  a  buzzer  on 
the  target  which  can  be  silenced  by  deactivating  the  targets  laser 
transmitter.    The  lasers  are  pulse  coded  to  differentiate  weapons'  ef- 
fects (e.g.,  rifles  can  kill  individuals  but  cannot  kill  tanks).  Employ- 
ment of  the  lasers  is  expected  to  reduce  the  need  for  human  controllers, 
and  simultaneously,  to  reduce  the  amount  of  data  on  tactical  activities. 

ES  differs  from  some  of  the  more  frequently  encountered  simulation 
techniques.    It  is  not  a  board  game  or  computer  simulation,  but  is  con- 
ducted in  training  fields,  with  a  full  complement  of  soldiers  and  equip- 
ment.   Although  it  employs  the  tactical  equipment,  it  emphasizes  the 
human  behavior:    it  is  man-ascendant  rather  than  machine-ascendant.  The 
decisions,  reactions  to  events  that  emerge  during  competition  with  a 
motivated.  Intelligent  adversary,  and  other  responses  to  the  environment 
are  emphasized.    The  cues  to  which  soldiers  must  respond  are  the  same  as 
those  to  which  they  respond  in  battle,  and  the  situation  changes  as  a 
result  of  their  actions.    Thus,  the  situation  is  emergent  rather  than 
prespeclfied,  highly  predictable,  or  amenable  to  analytic  solution 
(Boguslaw  and  Porter,  1966), 
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PERFORMANCE  ASSESSMENT 


The  objective  casualty  assessment  In  ES  provides  soise,  but  not  all, 
of  the  necessary  performance  measurement.    While  casualties  (target, 
firer,  and  time)  are  the  primary  cirlteria,  relying  solely  oh  them  makes 
it  difficult  to  determine  ^hy  they  occurred.    Additional  observations, 
or  measures  of  active  performance,  are  required  vhen  the  final  outcome 
is  not  an  adequate  Index  of  the  skill  (Cronbach,  1960).    Measures  of 
processes,  or  intermediate  task  and  training  objective  performance, 
assist  in  training  diagnosis  and  explanation  of  product  data.    An  example 
is  the  detection  and  engagement  of  the  enemy  at  the  maximum  possible 
range  during  defensive  missions.    Particularly  at  company  level  and  below, 
there  is  little  recognition  of  the  Importance  of  observation  posts  to 
provide  exact  and  timely  information  concerning  the  enemy.    In  exercises 
between  relatively  untrained  units,  most  critical  decisions  and  actions 
occur  along  the  forward  edge  of  the  battle  area.    As  the  units  become 
more  sophisticated,  leaders  in  the  defensive  unit  spend  a  greater  percent 
of  their  efforts  selecting  observation  post  positions,  planning  communi- 
cations and  indirect  fire,  and  positioning  long  range  direct  fire  weap- 
ons*   Consequently,  detection  and  effective  engagement  ranges  Increase. 

Tactical  outcomes  depend  upon  several  factors  other  than  the  profi- 
ciency of  the  units:    forces,  missions,  weather,  and  interactions  among 
these  factors  can  Influence  the  tactical  results.    For  example,  weather 
Interacts  with  force  mixes,  since  poor  visibility  favors  dismounted 
troops,  to  the  disadvantage  of  long  range  weapons.    If  visibility  im- 
proves during  the  tactical  action,  then  the  advantage  reverts  to  the 
long  range  weapons.    Due  to  these  Interactions,  the  outcome  does  not 
necessarily  indicate  the  relative  proficiency  of  the  opposing  forces. 
The  Impact  of  external  factors  must  be  considered  before  the  results  of 
an  exercise  can  be  used  to  diagnose  proficiency. 

Problems  arise  in  both  the  recording  of  behavior  (active  perfor- 
mance, or  processes)  and  the  encoding  of  the  environment  (such  as  the 
external  factors).    Thus,  observational  field  research  needs  a  system 
for  detecting,  measuring,  and  recording  the  events  and  other  factors 
pertinent  to  the  action  (Sells,  1966). 

Literature  on  ratings  and  observational  performance  assessment 
techniques  in  criterion  development  offers  suggestions  to  Improve  field 
measurement  (Blum  and  Naylor,  1968;  Goldstein,  1974;  Guilford,  1954; 
Simon,  1969;  Wherry,  1952).    Observations  and  ratings  of  behavior  can 
suffer  from  unreliability  and  inaccuracy  due  to  a  variety  of  error 
sources.    First,  the  performance  itself  is  inconsistent,  since  people 
perform  better  at  some  times  and  under  some  conditions  than  others. 
This  is  especially  true  in  emergent  situations,  where  a  given  behavior 
may  not  be  required  in  a  specific  Instance,  or  may  be  altered  to  suit 
the  situation.    Second,  the  detection,  or  observation  of  the  behavior  is 
unreliable.    An  observer  may  or  may  not  detect  a  given  behavior,  and 
different  observers  may  vary  in  correctness  of  perceiving  and  assessing 


It.    Third,  recording  of  behavior  Introduces  error,  depending  on  the 
type  of  record.    For  example,  Innnedlate  recording  of  events  as  they  oc- 
cur reduces  error  by  decreasing  recall,  or  memory  effects.  Despite 
these  error  sources,  observations  and  other  judgmental  measures  continue 
to  be  the  most  frequently  used  type  for  performance  criteria  (Blum  and 
Nay lor,  1968). 

Improved  measurement  can  be  achieved  when  the  researcher  (a)  speci- 
fies and  defines  as  concretely  as  possible  the  behaviors  to  be  observed, 
(b)  requires  data  collection  personnel  to  observe,  but  not  judge  the 
behavior,  (c)  trains  the  observers  fully,  and  (d)  requires  observers  to 
record  their  observations  Immediately.    The  following  sections  discuss 
how  we  applied  these  principles,  and  used  observational  techniques  In 
conjunction  with  objective  measures. 

OBJECTIVE  MEASURES 

The  use  of  casualty,  time,  and  mission  accomplishment  data  Is  de- 
monstrated by  results  from  the  validation  of  armor  REALTRAIN  (Scott, 
Mellza,  Hardy,  Banks,  and  Word,  1978).    Teams  composed  of  tanks,  heavy 
antitank  weapons  (TOW) ,  and  artillery  forward  observers  were  pretested 
against  a  similarly  composed  opposition  force  (OPFOR) .    Half  of  the 
tested  teams  then  had  a  week  of  REALTRAIN  training,  while  the  others  had 
conventional  tactical  field  trainlpg.    The  teams  were  posttested  against 
the  OPFOR.    Casualty  data  show  that  the  teams  were  similar  in  pretest 
performance  (each  bar  in  Figure  1  represents  52  vehicles).  REALTRAIN 
teams  improved  in  terms  of  casualties  inflicted  on  the  OPFOR,  as  seen  in 
the  posttest  data,  while  conventionally  trained  teams  did  not. 

Temporal  distributions  of  the  casualties  during  an  exercise  provide 
additional  insight  into  changes  In  tactical  performance.    When  the  cumu- 
lative percent  of  tested  unit  casualties  are  plotted  against  the  time 
lapsed.  It  appears  that  fewer  casualties  are  sustained  early  in  the 
exerci&es  after  REALTRAIN  training.  In  contrast  to  heavy  early  losses 
before  training  (Figure  2) .    Conventionally  trained  units  sustained  heavy 
early  casualties  both  before  and  after  training.    Time  data,  in  associa- 
tion with  other  objective  data  such  as  casualties,  can  be  of  help  in 
measuring  what  went  on  during  an  exercise  and  what  may  have  led  to  suc- 
cessful (or  unsuccessful)  mission  accomplishment. 

Mission  accomplishment  results  show  the  same  patterns  of  REALTRAIN 
effectiveness  as  did  the  casualties.    In  order  for  a  tested  unit  to  ac- 
complish Its  attack  mission.  It  had  to  clear  an  objective  of  OPFOR  ele- 
ments and  occupy  the  objective.    To  accomplish  Its  defense  mission.  It 
had  to  prevent  the  OPFOR  from  occupying  an  objective  for  sixty  minutes. 
Mission  accomplishment  data  for  both  attack  and  defense  missions  are 
combined  in  Figure  3,  where  each  bar  represents  8  exercises.  REALTRAIN 
teams  improved  in  their  ability  to  accomplish  t^^lr  jjlsslon  successfully, 
jAlle^.cgng^ti^aIly_tra^     teamis  did  not.  /oV^T^ 
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other  objective  data,  such  as  artillery  fire  planning  and  use,  are 
also  recorded.    An  Indirect  fire  data  form  is  completed  by  personnel  in 
the  fire  direction  center,  indicating  the  number  of  rounds  fired,  time 
distribution,  and  casualties  inflicted.  'The  example  in  Figure  4  shows 
that  "jeep  28"  was  hit  by  aix  rounds  early  in  the  exercise »  but  that  no 
other  indirect  fire  missions  for  this  team  were  effective  in  this  exer- 
cise.   Inclusion  of  these  data  further  clarifies  explanation  of  the  over- 
all results. 

ARMORED  CAVALRY  ENGAGEMENT  SIMULATION 

Unlike  other  ES  applications,  armored  cavalry  ES  cannot  rely  on 
casualties  as  performance  data.    "Cavalry *s  basic  tasks  are  reconnais- 
sance and  security"  (Department  of  the  Army,  1977),  and  may  not  be 
casual ty-*producing.    The  armored  cavalry  platoon  performs  information 
gathering  and  reporting  functions.    When  reconnaissance  units  withhold 
fire  (e.g.,  to  avoid  disclosing  their  positions),  tactical  events  may 
not  lead  to  casualties.    While  developing  ES  procedures  for  cavalry 
units 9  the  problem  was  to  develop  a  realistic  training  environment  for 
the  reconnaissance  functions,  while  maintaining  the  objectivity  and  cred- 
ibility of  the  casualty-producing  ES  exercises.    Thus^  the  cavalry  ES 
research  focused  on  process  measures  and  external  factors. 

An  amored  cavalry  ES  training  program  was  designed  with  help  from 
the  training  personnel  from  the  unit  providing  support  in  the  3d  Armored 
Cavalry  Regiment,  Fort  Bliss,  Tex.    Research  results  have  been  reported 
previously  (Knerr,  Hamill,  and  Severlno,  1978;  Knerr,  Stein,  Hamill,  and 
Severino,  1978). 

Only  two  weeks  were  available  for  the  program,  so  that  it  was  not 
feasible  to  test  all  combinations  of  missions,  force  structures  and  force 
ratios.    The  armored  cavalry  force  was  a  regimental  cavalry  platoon,  con- 
taining scout,  light  armor,  infantry,  and  mortar  sections.    The  OPFOR 
was  a  combined  arms  team  composed  of  tank,  TOW,  and  Infantry  sections, 
with  simulated  indirect  fire  support.    For  each  mission,  the  OPFOR  com- 
position was  varied  to  enhance  realism  and  provide  reasonable  opposi- 
tion.   The  missions  selected  were  reconnaissance  (area,  route,  and  zone) 
and  delay  (Table  1) . 

In  these  exercises,  weather  and  terrain  had  strong  effects  on  mis'- 
sion  accomplishment.    The  weather  was  clear  and  sunny,  providing  optimal 
visibility.    The  terrain  was  flat  desert,  although  there  were  sand  dunes 
that  could  hide  vehicles  and  soldiers.    Moving  vehicles  were  quickly 
detected  by  exhaust  smoke  and  dust  clouds  from  the  tracks.    The  force 
assigned  an  attack  mission,  or  moving  mission  of  any  sort,  was  at  a  dis- 
advantage under  these  conditions. 

Relative  combat  power  Interacted  with  other  external  factors.  Re- 
sults of  an  attack  with  a  3  to  1  force  ratio  differ  from  results  with 
6  to  1  odds.    If  the  opposing  force  is  either  too  strong  or  too  weak, 
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Table  1 
PLATOON  MISSIONS  BY  EXERCISE 


ixERCisE  Cavalry  Platoon  Mission  OPFOR  Platoon  Mission 

^    1  Zone  Reconnaissance  Delay 

-2  Route  Reconnaissance  Screen 

I    3  Flank  Guard  Route  Reconnaissance 

k  Area  Reconnaissance  Delay 

5  Route  Reconnaissance  Attack 

-6  Delay/Defend  in  Sector  Zone  Reconnaissance 
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differences  between  the  tmits  may  not  emerge  due  to  "ceiling"  effects. 
During  the  first  two  days,  the  cavalry  had  reconnaissance  missions  and 
the  OPFOR  had  a  strong  composition  (main  battle  tanks,  TOW,  and  infan- 
try).   After  being  hit  hard  on  the  first  day,  the  cavalry  moved  so  slowly 
on  the  second  day  that  It  made  little  progress.    They  did  send  reports 
of  enemy  strength  to  the  commander,  and  it  was  realistic  that  they  did 
not  move  forward  in  a  "suicide"  mission  against  the  heavy,  long  range 
weapons  they  detected.    On  subsequent  days,  the  OPFOR  was  reduc^a  and 
the  action  was  more  realistic « 

External  factors  (missions,  terrain,  weather,  forces)  need  to  be 
considered  in  interpreting  mission  outcomes  as  measures  of  unit  profi- 
ciency in  tactical  situations.    Figure  5  shows  the  outline  of  a  data 
form  used  to  describe  the  exercise.    The  record  starts  with  a  descrip- 
tion of  the  exercise  lane  (usually  augmented  by  a  map  or  sketch),  weath- 
er, general  tactical  situation,  missions,  force  structures,  and  other 
external  fr.ctors  or  chance  events.    Next  are  notes  of  the  platoon  leaders* 
plans,  and  orders  to  the  vehicle  commanders.    Complete  notes  of  the  tac- 
tical activities  are  then  recorded,  along  with  the  mission  outcomes. 
These  notes  on  plans,  orders,  and  tactical  activities  provide  an  over- 
view of  processes,  i.e.,  active  performance  during  the  exercise. 

PROCESS  MEASURES 

Process  measurement  in  the  armored  cavalry  ES  development  was  based 
on  the  principles  described  earlier  for  the  improvement  of  observational 
measurement:    train  observers;  specify  the  behavior  to  be  observed  as 
precisely  as  possible;  and  record  during  the  action.    Observers  received 
initial  training  during  three  days  of  small  scale  exercises  that  pre- 
ceded the  full  scale  platoon  exercises.    These  small  exercises  also 
familiarized  the  observers  with  the  terrain,  equipment,  maneuvers,  and 
data  collection  forms.    Observers  were  thoroughly  briefed  each  day  on  the 
exercise  scenario,  operations  orders,  and  anticipated  tactical  events. 

In  the  first  exercise,  the  cavalry  platoon  had  a  zone  reconnais- 
sance mission.    To  clarify  the  behavior  to  be  observed  and  recorded, 
more  detail  was  needed  than  is  given  in  the  cavalry  ARTEP  (Figure  6; 
Department  of  the  Army,  1976).    To  perform  effectively,  the  commander 
needs  to  know  the  location  and  status  of  friendly  forces,  and  the  loca- 
tion and  strengths  of  enemy  forces.    The  reconnaissance  elements  had  to 
learn  the  importance  of  detecting  the  enemy  at  the  maximum  possible 
range,  and  reporting  the  information  to  the  commander.    For  example, 
they  had  to  provide  exact  and  timely  reports  concerning  the  enemy  to 
enable  effective  use  of  indirect  fire. 

To  support  these  training  objectives,  the  operations  orders  for  the 
first  exercise  gave  the  cavalry  platoon  a  zone  reconnaissance  mission, 
with  the  request  that  they  provide  early  warning,  occupy  an  objective  by 
a  given  time,  and  prepare  to  defend.    Specific  elements  of  intelligence 
and  coordinating  Instructions  aliso  clarified  their  assignment.  Essential 
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elements  of  intelligence  included  enemy  left  in  the  area,  enemy  strong 
polntst  and  enemy  ability  to  move  forward.    In  the  coordinating  instruc- 
tions »  the  unit  was  requested  to  hold  at  phase  lines  and  request  permis- 
sion to  cross,  and  to  bypass  pockets  of  resistance.    They  were  under 
weapons  hold  status,  in  which  they  could  fire  only  with  permission  of 
the  commander.    Thus,  the  general  requirements  in  the  ARTEP  mission  were 
stated  more  specifically,  and  observable  activities  were  defined. 

The  general  situation  described  in  the  operations  orders  was  real- 
istic for  a  weapons  hold  situation.    As  a  result  of  this  status,  the 
vehicle  commanders  frequently  reported  enemy  information,  along  with 
repeated  requests  for  release  from  weapons  hold  and  consequent  permission 
to  fire.    They  used  their  reports  to  build  a  convincing  requirement  to 
fire.    The  weapons  hold  status,  applied  in  the  highly  motivating  ES  en- 
vironment, appeared  to  elicit  concentrated  reconnaissance  reporting. 

Establishing  the  reporting  requirements,  and  reinforcing  them  using 
the  weapons  hold  status,  made  tactical  communications  a  valuable  data 
collection  vehicle.    The  reports  contained  time  and  location  information 
for  both  friendly  and  enemy  elements.    The  quality  of  the  data  was  a 
problem,  in  both  accuracy  and  completeness  due  to  radio  problems,  and 
reliance  on  tactical  participants'  own  skills  in  location  reporting. 
The  controllers  (e.g.,  troop  commander)  had  to  check  the  accuracy  of  the 
location  information,  but  at  least  their  task  was  narrowed  to  a  manage- 
able size  by  their  having  the  other  report  information. 

Additionally,  the  report  data  could  be  corroborated  in  many  in- 
stances by  its  relation  to  objective  data.    In  this  first  exercise, 
there  were  six  casualties,  providing  objective  information  to  confirm 
detections  of  enemy  activity  that  were  reported  in  the  same  time  periods. 
Also,  some  conditions  were  established  to  create  known  situations,  ser- 
ving as  probes  to  test  reconnaissance  capability. 

In  the  second  exercise,  items  of  intelligence  interest  were  placed 
at  three  known  locations,  as  shown  on  the  map  sketch  used  to  brief  the 
observer  (Figure  ?) :    an  abandoned  armored  personnel  carrier,  some 
weapons,  and  an  enemy  soldier  (represented  by  a  mannequin).    Reports  from 
one  of  the  rifle  squads  early  in  the  exercise  indicated  that  the  squad 
was  not  where  it  should  have  been;  however,  there  was  no  way  to  be  sure 
of  their  actual  location.    When  they  reached  the  abandoned  vehicle  and 
correctly  reported  its  REALTRAIN  number,  however,  it  was  certain  that 
they  had  followed  the  wrong  road.    The  mannequin  "enemy  soldier"  also 
enabled  observers  to  record  the  location  of  tactical  events.    Figure  8 
shows  a  map  record  noted  by  an  observer  on  the  cavalry  platoon  leader's 
vehicle.    Tt.e  controller  recorded  the  times  and  places  that  the  platoon 
leader  dismounted  to  conduct  ground  reconnaissance.    These  estimates  were 
verified  when  the  vehicle  reached  the  known  location  of  the  mannequin  and 
"took  dummy  prisoner  at  1255".    The  observational  data  were  thus  anchored 
to  a  known  location.     In  general,  the  known  locations  clarified  records 
of  tactical  performance. 
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DISCUSSION 


Often  In  performance  assessment  situations,  there  Is  a  strong 
tendency  to  measure  what  Is  easy  to  measure.    For  example,  the  Army 
Training  Tests^  which  preceded  the  current  ARTEPb,  relied  heavily  on 
subjective  checklists  concerning  the  planning,  coordination,  preparation, 
and  movement  phases  of  tactical  operations.    ARTEPs  emphasize  the  Impor- 
tance of  analyzing  critical  aspects  of  missions.    The  major  tasks  dif- 
ferentiated for  each  mission  in  the  ARTEP  reflect  fundamentals  of  land 
combat  more  accurately  than  did  the  earlier  Army  Training  Tests.  How- 
ever»  extensive  training  experience  with  tactical  ES  has  demonstrated 
that  further  Improvement  can  be  made  in  the  selection  of  training  ob- 
jectives and  the  measurement  of  the  attainment  of  the  objectives. 

The  application  described  in  this  paper  started  with  the  explicit 
definitions  of  some  of  the  processes,  or  intermediate  training  objec- 
tives.   The  objectives  were  clarified  by  instructions  in  the  operations 
orders  that  resulted  in  use  of  tactical  reports  to  augment  the  data  col- 
lection.   The  reports  were  corroborated  using  probes  (of  known  location) 
and  casualty  reports.    Thus,  data  of  questionable  accuracy  were  linked, 
where  possible,  with  more  accurate  data.    Observations  were  recorded 
during  the  exercises  by  data  collectors  who  were  thoroughly  versed  in 
the  training  objectives,  tactical  situation,  missions,  probes,  and 
special  techniques  such  as  use  of  weapons  hold. 

This  paper  has  focused  on  the  nature  of  tactical  data  attainable 
using  ES  operations  to  acquire  objective  data,  and  methods  of  enhanc- 
ing    the  accuracy  of  data.    ARI  is  also  working  on  the  improvement  of 
data  collection  and  analysis,  using  an  Automated  Tactical  Operations 
Measurement  System  (ATOMS),  with  contractual  support  from  Human  Sciences 
Integrated.    ATOMS  is  comprised  of  data  collection  instruments,  associ- 
ated data  collection  and  reduction  procedures,  and  a  software  package 
for  summary  descriptive  statistics  from  which  further  analyses  may  be 
msAe  (Epstein,  1978;  Root,  Knerr,  Severino,  and  Word,  1978). 

The  inherent  difficulty  of  measuring  complex  human  performance  in  a 
field  environment  accounts  in  part  for  the  shortage  of  satisfactory 
methods  for  unit  performance  measurement  (Wagner,  Hibbits,  Rosenblatt, 
and  SchulZy  1977).    The  methodology  described  here  depends  on  the  whole 
system,  from  clarification  of  the  training  objectives,  objective  obser- 
servatlon  and  recording,  analysis,  and  explanation  in  sufficient  detail 
to  show  how  and  why  outcomes,  such  as  mission  accomplishment,  occurred. 
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EVALUATION  OF  THE  MODIA  PLANNING  SYSTEM 


INTRODUCTION 


Capt  John  R  Welsh  Jr,  USAF-ATC  Tech.  Appl.  Ctr 


Rand  Corporation  initially  designed  the  MODIA  (Method  of  Designing 
Instructional  Alternatives)  system  as  a  research  tool.    Air  Training 
Command  (ATC)  has  examined  the  potential  of  the  system  as  a  computerized 
planning  tool  for  use  in  facilitating  course  planning.    The  primary  objec- 
tive of  the  MODIA  system  is  to  provide  a  systematic  process  of  relating 
quantitative  resource  requirements  to  course  design  and  operation.  MODIA 
was  designed  to  enable  planners  to  consider  different  sequences  of  course 
objectives,  alternative  sequences  of  subject  matter,  varying  teaching 
methods  and  types  of  instruction,  and  different  mixes  of  students,  equip- 
ment, and  facilities.    Moreover,  MODIA  simulates  the  way  in  which  students 
progress  through  alternative  course  designs.    The  MODIA  planning  system 
has  four  components:    the  description  of  options  for  course  design,  the 
User  Interface  (UI),  the  Resource  Utilization  Model  (RUM),  and  the  Cost 
Model  (MODCOM).    The  UI  is  the  interactive  portion  of  MODIA,  the  RUM 
provides  the  simulation  of  course  operation  and  the  MODCOM  provides  course 


The  initial  development  of  MODIA  was  completed  in  October  of  1973, 
ISD  teams  from  Keesler  and  Lowry  performed  a  critical  design  review  at 
that  time.    Rand  Corporation  made  several  revisions  based  on  the  design 
review,  and  the  Phase  I  service  test  of  MODIA  was  conducted  at  Keesler 
AFB  from  March  1976  to  June  1976.    The  results  of  the  service  test  were 
reported  in.  ATC  Project  76-1  (30  July  1976).  4   The  results  generally 
indicated  that  MODIA,  *'has  the  potential  to  be  an  effective  planning  tool 
whose  use  could  lead  to  more  cost-effective  technical  training  courses." 
Several  important  questions,  however,  could  not  be  addressed  in  the  Phase 
I  evaluation.    This  study  provides  data  relevant  to  those  unanswered 
questions. 

During  Phase  I,  Rand  personnel  reached  the  conclusion  that  the  MODIA 
system  was  too  complicated  to  be  used  effectively  by  the  planners  them- 
selves, and  as  a  result,  a  team  of  individuals  was  trained  in  the  operation 
of  MODIA.    This  group,  subsequently  called  the  interface  team,  operated 
MODIA  while  the  course  planners  provided  the  planning  data  needed  to  design 
courses.    The  concept  of  the  Interface  team  carried  over  in  this  evaluation. 

The  physical  arrangements  at  Keesler  during  Phase  II  were  very  similar 
to  those  arrangements  which  existed  during  Phase  I.    The  special  features 
of  these  arrangements  included:    (1)    a  room  in  ;hich  the  interface  team 
operated  a  remote  terminal;  (2)  a  Class  A  telephone  line  used  with  an 
acoustical  coupler;  (3)  a  MODEM  (Bell  103A  Data  Set)  in  the  computer  faci- 
lities connected  on  a  dedicated  line  to  Biloxi,  Mississippi  telephone 
exchange;  (4)  one  of  the  primary  provisions  of  both  Phase  I  and  Phase  II 
tests  was  that  they  be  conducted  on  a  "non-interference"  basis.    The  hours 
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of  operation  for  the  User  Interface  Program  were  to  be  from  0530  -  0700 
hours  dally,  1100  -  1200  hours  -  2  days  a  week,  and  occasionally  as  other 
use  dictated.   While  this  schedule  was  the  best  that  could  be  devised 
under  the  conditions  of  the  service  test,  it  placed  severe  restrictions 
on  the  response  time  of  MODIA  planning  of  alternatives  and  hampered  evalu- 
ation of  the  MODIA  system  in  that  not  as  many  course  alternatives  could 
be  generated  as  were  desirable. 

The  shakedown  and  debugging  of  the  MODIA  system  on  the  Keesler  H-6060 
took  place  in  October  1977.    The  actual  service  test  for  Phase  II  began 
on  14  Nov  77  and  ended  17  Feb  1978. 

The  Phase  II  evaluation  capitalized  on  the  experience  gained  in  Phase  I, 
while  expanding  the  scope  of  the  evaluation  of  MODIA  by  addressing  new 
questions  about  its  use:    in  planning  specific  types  of  courses  (family 
group  courses);  in  controlling  the  system  by  technical  school  management; 
in  assessing  the  value  of  the  system  to  planners  and  managers;  and  in 
determining  the  data  automation  requirements  of  the  system  now  and  in  the 
1980s. 

The  objectives  of  this  service  test  were  to: 

a.  Provide  sufficient  test  data  to  support  the  development  of  a  Data 
Autorwation  Requirement  (DAR)  should  MODIA  be  adopted. 

b.  Determine  MODIA' s  usefulness  as  a  planning  tool.    This  objective 
had  several  aspects.  Specifically: 

(1)  Explore  MODIA' s  usefulness  in  planning  type  3,  family  group 
courses  with  shared  resources, 

(2)  Assess  the  utility  of  the  system  to  the  technical  training 
school  management, 

(3)  Determine  MODIA' s  usefulness  given  currently  existing  resource 
constraints,  current  computer  support  capability,  and  training  policy. 

c.  Determine  MODIA' s  usefulness  as  a  problem-solving  tool. 

d.  Determine  the  organizational  configuration  and  operational  procedures 
which  may  be  used  in  applying  MODIA  at  Keesler. 

e.  Determine  resources  required  to  implement  MODIA  at  Keesler  in  the 
immediate  future. 

f.  Determine  what  changes  to  MODIA  are  needed  to  improve  its  effective- 
ness. 

g.  Examine  the  potential  for  using  the  cost  model  (MODCOM)  as  a 
stand-alone  system. 


1336^ 'I'l': 


6.  Develop  a  training  program  for  the  use  of  MODIA  -  including 
development  of  a  "User  Interface  Team  Guide". 


METHOD 


ISD  Team  Make-up,    Initially,  it  was  planned  that  each  Instructional 
Systems  Development  (ISD)  team  would  be  composed  of  a  curriculum  training 
specialist,  a  training  resource  specialist,  and  a  subject  matter  specialist. 
In  practice,  however,  the  interface  team  member  from  each  of  the  three 
training  groups  Involved  in  this  exercise  worked  with  only  one  other  person 
from  the  training  group.    This  person  provided  the  primary  ISD  Input. 
Others  were  involved  as  needed  in  the  planning  of  different  parts  of  a 
given  course  (for  example,  the  planning  of  3ABR32831  involved  up  to  as 
many  as  5  ISD  people).    The  reorganization  of  the  technical  training  center 
and  shortage  of  experienced  planners  dictated  this  deviation  from  the  evalu- 
ation plan.    It  should  be  mentioned  that  the  use  cf  fewer  people  significantly 
drove  down  the  personnel  cost  of  planning  with  the  MODIA  system.    In  contrast 
to  Phase  I  cost  analysis  which  included  personnel  costs  of  many  ISD  team 
members,  this  service  test  figures  personnel  costs  associated  with  only  one 
or  at  most  several  (in  the  case  of  3ABR32831)  ISD  team  members'  time.  Each 
ISD  member  was  responsible  for  revising  the  selected  courses  with  inputs 
from  other  tech  school  personnel  as  needed. 

The  MODIA  Interface  Team.    The  interface  team  was  composed  of  a  GS-ll, 
a  Master  Sergeant,  and  a  Technical  Sergeant.    The  training  of  the  interface 
team  was  accomplished  at  the  Rand  Corporation,  Santa  Monica,  California, 
during  the  period  12  Sep  77  to  25  Sep  77.    This  group  of  individuals  served 
as  the  interface  between  the  ISD  planner  and  the  MODIA  system. 

Course  Selection.    Courses  selected  for  MODIA  service  test  during 
Phase  II  were: 


Training 

Course 

Group 

Number 

3380  TTG 

3ABR32831 

3390  HG 

3ABR27630 

3390  HG 

3ABR27630-001 

3390  HG 

3ABR27630-002 

3410  HG 

3ABR30434-1 

3410  TTG 

3ABR30434-5 

3410  TTG 

3ABR30434 

Title 

Avionics  Nav  System  Spec'alist 
A  C&W  Systems  Operator  (Manual) 
A  C&W  Systems  Operator  (SAG£) 
A  C&W  Systems  Operator  (4C7L) 
Ground  Radio  Equipment  Repairnian  (Titan) 
Ground  Radio  Equipment  Repairman  (Minuteman) 
Ground  Radio  Equipment  Repairman 

Course  Selection  was  based  on  the  following  conditions; 

(1)   There  were  two  sets  of  family  group  courses  (27630  and  the  30434 
courses)  that  had  to  be  revised. 
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.   (2)   Low,  medium  and  high  student  loads  were  represented. 

(3)  Different  instructional  approaches  were  represented. 

(4)  All  courses  were  planned  through  all  5  steps  of  the  ISO  process. 

(5)  One  of  the  courses  (32831)  was  of  long  duration  and  used  a  great 
many  resources. 

The  assumption  underlying  this  course  selection  was  that  these  courses 
represented  the  best  planning  possible  by  conventional  means.     If  fiODIA 
could  be  effectively  used  in  the  ISD  process,  then  both  planners  and  managers 
could  improve  course  designs  by  allowing  for  more  cost  effective  planning. 

MODIA  Service  Test  Costs  --  Data  Collection. 

a.  It  is  Important  to  make  a  distinction  here  between  Phase  I  and 
Phase  II.    Phase  I  results  showed  that  MODIA  could  be  used  by  training 
branch  and  group  level  personnel  to  decrease  course  costs  through  better 
design  if  they  could  use  an^  design  they  chose,  regardless  of  training 
policy  or  personnel  management  consideration.    Phase  II  attempted  to  see 
how  well  they  could  use  MODIA  to  manage  costs  in  the  present  training 
environment,  and  within  real-world  constraints.    One  of  the  goals  of  the 
Phase  II  effort  was  to  examine  the  cost  of  the  MODIA  system  in  the  light 
of  such  restraints.    In  order  to  accomplish  this  goal,  all  elements  of 
the  system  cost  were  collected  as  outlined  in  the  evaluation  plan. 

b.  Mnpower,  facilities,  equipment,  and  computer  costs  were  collected 
by  KTTC/TTGH.    Total  equipment  and  manpower  cost  breakouts,  by  course, 
are  provided  at  Appendix  1.    These  costs  will  be  discussed  in  the  Results 
Section. 

c.  The  primary  cost  of  MODIA  was  in  the  manpower  and  computer  time 
required  to  support  the  system.    This  data  was  gathered  through  work  logs/ 
time  sheets  and  a  terminal  log  kept  by  Interface  team  members  and  ISD 

team  members  throughout  the  course  of  the  service  test.   The  work  logs/time 
sheets  were  filled  out  on  a  weekly  basis  to  Insure  current  and  reasonable 
estimates  of  time  spent  on  various  portions  of  the  MODIA  service  test. 
Course  cost  data  and  cost  Information  for  use  as  inputs  into  MODCOM  were 
provided  by  the  Comptroller  and  from  Keesler  Production  Analysis. 

d.  Requirements  for  computer  resources  (CPU  time  and  time-sharing 
storage  requirements)  for  the  various  portions  of  the  User  Interface  are 
provided  at  Appendix  2.    However,  several  recommended  changes  to  the  MODIA 
programs  are  probably  extensive  enough  to  significantly  alter  the  operating 
characteristics  of  the  system. 

e.  Specific  changes  to  the  MODIA  system  were  provided  In  written 
conwents  by  ISD  personnel  and  all  interface  team  members,  as  well  as 
training  managers. 


f.    Down-time,  equipment  malfunctions  and  waiting  time  were  not 
counted  as  a  direct  cost  of  MODIA  planning,  since  it  was  assumed  that 
such  costs  would  be  minimal  with  a  fully  operational  MODIA  system. 

Questionnaires.    Data  contained  in  responses  to  the  questions  in  all 
questionnaires  were  summarized  and  consolidated  to  provide  opinion  informa- 
tif5n  on  MODIA' s  usefulness  as  a  planning  tool  and  as  a  problem-solving  tool. 
Additionally,  the  information  provided  from  questionnaire  comments  provided 
a  basis  for  recommended  changes  to  the  MODIA  system. 


RESULTS 

General .    In  response  to  one  of  the  recommendations  in  ATC  PR  76-1, 
Evaluation  of  the  MODIA  system,  one  of  the  primary  purposes  of  this  evalu- 
ation was  to  determine  the  utility  of  the  MODIA  system  to  technical  training  - 
management.    In  fact,  the  Phase  I  report  went  so  far  as  to  say  that  when 
procedural  questions  and  organizational  configuration  questions  were  resolved, 
"It  appears  that  SAAS  management  will  be  able  to  show  that  MODIA  can  improve 
resource  management  in  a  technical  training  environment  (para  18g,  p.  33, 
ATC  PR  76-1)."    The  results  of  the  Phase  II  evaluation  dictate  a  different 
conclusion.    The  next  section  will  first  discuss  the  implementation  and 
operating  costs  of  the  service  test  at  Keesler,  and  then  report  the  results 
as  they  relate  to  the  objectives  previously  outlined. 

MODIA  Phase  II  Service  Test  Costs.    For  the  Phase  II  service  test 
implementation,  it  cost  Keesler  Technical  Training  Center  $11,074,  For 
the  operation  of  the  system  during  the  service  test,  it  cost  the  technical 
training  center  $44,297.    (See  Appendix  1  for  complete  cost  breakdown  by 
course).    These  figures  are  considerably  different  from  those  obtained 
during  the  Phase  I  evaluation.    For  example,  the  Phase  I  report  placed  the 
implementation  cost  at  around  $36,000.    The  approximate  $25,000  difference 
between  that  service  test  and  this  one  can  be  explained  by  taking  into 
account  several  important  differences  between  the  two  service  tests.  These 
differences  include  a  drastic  reduction  in  the  number  of  personnel  and  man- 
hours  involved  in  the  shakedown  and  set-up  phase  of  the  service  test. 
Additionally,  fewer  manhours  were  needed  to  supply  interface  team  members 
with  planning  factors  -  i.e.,  there  were  fewer  people  involved  with  day-to- 
day operation  of  the  system.    More  about  these  implementation  and  operating 
cost  differences  will  be  mentioned  in  the  discussion  section.    Overall,  the 
total  service  test  cost  was  less  than  anticipated,  despite  the  fact  that 
computer  costs  were  substantially  greater  than  those  costs  obtained  in 
Phase  1. 

MODIA  System  Operating  Characteristics  and  Limitations.    As  mentioned 
in  the  Introduction  Section,  the  size  of  the  UI  portion  of  the  MODIA  system 
was  so  large  that  it  caused  some  initial  interference  with  the  training 
mission  —  and  resulted  in  restricted  operating  hours  for  the  MODIA  service 
test.    The  primary  result  of  the  service  test  experience  that  pertains  to 
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the  operating  characteristics  is  that  the  UI  (and  especially  the  "C'' 
phase)  1s  much  too  large.   While  it  is  conceivable  the  UI  could  be  made 
smaller  and  more  efficient,  serious  consideration  would  have  to  be  given 
to  size  trade-offs  Involved  with  the  RUM  program.   This  trade-off  is  dis- 
cussed In  the  Results  Section. 

In  general,  the  operating  problems  experienced  during  the  service  test 
can  be  traced  to  the  fact  that  MODIA  programs  were  written  for  an  IBM 
370/158,  and  were  somewhat  Incompatible  with  the  Honeywell  6060  system. 
Another  Important  factor  tn  the  Incompatibility  Is  that  the  IBM  system  is 
a  ytrtual  storage  system,  while  the  Honeywell  system  is  a  segmented  storage 
system  which  uses  program  overlays.   Aside  from  som?  basic  incompatibility 
between  the  MODfA  software  and  the  Honeywell  system,  there  were  other 
problems  encountered  in  using  the  Ui. 

Several  specific  findings  regarding  the  operation  and  use  of  the 
MODIA  programs  were  garnered  from  interface  team  members*^  responses  in 
their  questionnaire.    Most  team  members  found  it  easy  to  assign  learning 
objectives  (course  content)  on  the  UI  with  some  notable  exceptions.  First, 
MODIA  was  unable  to  handle  assignment  of  course  content  under  the  family 
grouping  concept.    The  250  learning  event  limit  was  much  too  restrictive 
for  planners  to  use  MODIA  for  simulating  courses  with  shared  resources. 
Two  of  the  basic  courses  planned  in  this  exercise  did  have  221  and  241 
learning  events  each,    Basic,  single  courses  could  easily  fit  within  the 
limitations.    However,  courses  planned  under  family  grouping  required  up  to 
three  times  the  250  learning  event  limit.    Second,  garbling  prevented  the 
assignment  of  lesson  objectives  which  had  certain  letter/number  combinations. 
For  example,  in  the  3ABR30434  course,  the  interface  team  member  entered 
subl0309  for  a  learning  objective,  but  the  computer  read  IDANTCU.  This 
garbling  was  a  factor  throughout  the  service  test.    Nevertheless,  from 
responses  to  the  questionnaire,  it  appeared  that  the  interface  team  was 
sufficiently  trained  to  be  able  to  handle  most  problems  that  arose  in 
assigning  course  content. 

The  assignment  of  resources  to  specific  learning  events,  however,  was 
^  different  matter.    In  all  cases,  interface  team  members  found  it  difficult, 
and  in  some  cases,  extremely  difficult  to  allocate  resources  to  learning 
events  in  the  way  they  wanted  to  make  the  assignments.   The  main  problem 
encountered  was  In  the  extremely  limited  number  of  different  resources 
allowed  on  MODIA  (only  30).    All  interface  team  members  had  to  "package" 
resources  In  order  to  make  resource  assignments  using  the  UI,    In  some 
cases,  a  considerable  number  of  resources  had  to  be  lumped  together  or 
packaged  in  order  to  make  the  resource  assignment.    In  the  case  of  planning 
both  the  30434  and  the  32831  baseline  courses,  very  few  of  the  needed  resources 
could  be  assigned  in  such  a  way  as  to  depict  realistic  use  of  resources. 

The  limited  capability  of  MODIA  to  handle  the  required  resource  assign^ 
ment  was  commented  on  frequently.    In  fact,  the  interface  team  members 


1340 


felt  that  the  30  resource  limitation  hampered  realistic  simulation  of 
course  operation  since  the  32831  basic  course  required  the  assignment  of 
80  different  resources  in  the  30434  baseline  course,  over  100  different 
resource  assignments  were  needed. 

Another  major  constraint  of  the  UI  program  was  its  inability  to 
handle  certain  student  arrival  options.    For  example,  staggered  entries 
with  variable  numbers  of  entering  students  could  not  be  simulated.  More- 
over, all  interface  team  members  voiced  the  need  for  a  system  which  could 
realistically  depitct  shift  operations.    While  the  MODIA  system  can  be 
raanioulated  to  allow  simulation  of  courses  with  a  shift  operation,  the 
resultant  product  had  severe  limitations.    Specifically,  the  250  learning 
event  limit  for  the  UI  was  much  too  restrictive  for  depicting  courses  with 
shift  operations.    For  example.  Course  3ABR32831  would  require  approximately 
400  learning  events  to  simulate  shift  operation.    Resources  and  learning 
events  for  this  course  would  have  had  to  be  condensed  or  packaged  even  more 
to  simulate  a  two-shift  operation.    Moreover,  course  managers  felt  they  had 
a  better  handle  on  managing  shift  operations  with  current  techniques. 

All  interface  team  members  felt  confident  in  using  the  UI  and  all  found 
the  User's  Guide  (provided  by  Rand)  very  helpful  in  working  the  system. 
But  again,  there  was  considerable  difficulty  in  working  around  garbling 
problems.    As  in  Phase  I,  the  numbers  143  and  168  were  read  as  145  and  170. 
This  particular  garbling  caused  a  problem  every  time  one  had  to  enter  learn- 
ing event  numbers  143  or  168,    The  problem  was  surmounted  by  labeling  these 
learning  events  as  "sick-call"  and  assigning  zero  time  to  the  learning 
event. 

In  general,  the  interface  team  felt  confident  with  the  simulations 
and  expressed  the  need  for  a  system  like  MODIA,  but  all  members  also 
remarked  that  the  system,  in  its  present  form,  had  too  many  limitations. 
All  interface  members  rated  the  output"  of  the  RUM  as  fairly  easy  to 
interpret  and  use,  and  of  value  in  the  planning  process. 

The  specific  changes  recommended  for  MODIA  are  dealt  with  in  detail 
later.    Suffice  it  to  say  that  in  the  experience  of  the  Phase  II  service 
test,  the  UI  system  was  too  large  and  inefficient  to  be  used  on  the  Keesler 
H-6060  computer  now,  or  in  the  future.    Based  on  limitations  and  problems 
experienced  in  this  portion  of  the  service  test,  and  on  the  results  re- 
ported in  the  next  section,  MODIA  should  not  be  adopted  for  us  "as  is". 
The  operating  limitations  experienced  during  Phase  II  were  very  similar 
to  those  experienced  in  Phase  I, 

MODIA  Usefulness  as  a  Planning  Tool 

General.    The  basic  thrust  of  the  Phase  II  evaluation  effort  was  to 
determine  MODIA' s  usefulness  as  a  management  planning  and  problem-solving 
tool.    The  assessment  of  MODIA' s  utility  must,  of  necessity,  be  subjective 
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and  depend  on  the  opinions  and  judgments  of  those  in  the  training  center 
management  hierarchy  who  would  use  a  system  such  as  MODIA.    The  strategy 
of  this  evaluation  effort  was  to  present  course  managers  with  MODIA  products 
(the  RUM  simulation  and  Cost  Model  course  costs  for  each  alternative  design) 
and  see  If  they  could  use  either  the  simulation  information  or  cost  model 
Information  to  arrive  at  more  cost  effective  course  designs  —  while  staying 
w1th1n  the  limits  of  coimand  and  center  level  policy  directives,  manpower 
limitations  and  resource  constraints. 

The  basic  question  involves  "how"  management  should  use  the  MODIA 
system.    Therefore,  management  responses  and  results  of  alternative  course 
costs  win  be  discussed  with  respect  to  obtained  results  1n  each  of  the 
technical  training  groups,  respectively.    To  help  the  reader  keep  this 
part  of  the  evaluation  conceptually  straight,  there  are  two  basic  aspects 
of  MODIA  information  that  could  be  of  use.    The  questions  that  address 
these  aspects  are:    (1)    How  useful  was  the  cost  model  information  on 
alternative  course  designs?   and,  (2)  How  useful  was  the  simulation  infor- 
mation?  The  first  question  is  answered  by  results  discussed  here,  while 
results  pertaining  to  the  second  question  are  discussed  later.    In  the 
discussion  to  follow,  each  of  five  basic  courses  in  the  technical  training 
groups  was  simulated  and  cost  for  the  courses  calculated  using  the  cost 
model.    The  "baseline"  course  cost  figures  and  operating  parameters  were 
designed  to  reflect  the  way  the  course  actually  operated  during  1977.  All 
baseline  course  costs  were  figures  on  the  6-hour  training  day  and  were  ^ 
compared  with  total  course  cost  figures  derived  using  the  ATC  Comptroller  s 
figures  on  costs/graduate  in  each  of  the  training  courses  multiplied  by 
the  annual  graduates.    The  results  show  that  the  cost  model  figure  for 
total  course  costs  agrees  closely  with  a  total  course  cost  using  the 
comptroller's  cost  fartors      a  result  that  agrees  with  Phase  I  findings 
on  cost  model  accuracy. 

Because  of  the  time  limits  of  the  service  test,  cost  model  information 
was  obtained  for  three  basic  courses  and  alternatives  for  each  of  the  three 
courses. 

3380  Technical  Training  Group  -  Results  of  Alternative  Costing. 

Table  1  presents  the  cost  of  the  baseline  3ABR328131  course  in  comparison 
with  costs  of  various  alternatives  generated  by  the  Interface  team  members, 
ISD  people,  and  training  managers. 

While  the  cost  model  was  designed  to  be  used  either  in  conjunction 
with  the  simulation  portion  or  by  itself.  Ideally,  planners  would  cost 
out  alternative  courses  they  had  simulated  to  determine  the  most  cost 
effective  option.    In  the  present  case,  five  out  of  eight  alternatives 
were  simulated. 

Referring  to  Table  1,  alternatives  7  and  8  were  less  expensive  than 
the  other  alternatives.   These  two  courses  represented  slightly  increased 
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^  TAHf  1  .  ^ 

COST  HODEL  BASELINE  COURSE  COST  COHPARISONS 
WITH  ALTERNATIVE  COURSE  COSTS 
(3ABR32831)* 

Alternative  No.     Baseline^   1^     2      3     4     5     g      7  3 

Data 

!•  Annual  Entry  412     486  ,    486     730    1128   1128   389    389  389 

2.  Annual  Graduates         269     318    477     477    737    737   255    266  282 

3.  Annual  Failures  7      8     13      13     19     19     6      6  6 

4.  Avg  Course  Hours         527     515    527     527    527    527   527    616  537 

5.  NuBiier  of  Instructors      32      34     59      52     73     63    38     38';  38 

^'  XluS)^       ^^^^'^  ^^^^'^  ^^^'^  ^^^^'^  "'^  ^^"-^l^^^'Q  ^^2.6\  1708.0 

7.  ATC  Course  Cost/Graduate  7.23205 

8.  ATC  Total  Course  Cost 

((7W2))  1945.4 

9.  Difference 

(Between  8  and  6)        63.5  (  3.2^) 

a.  In  thousands  of  dollars 

b.  Adjusted  by  a  factor  of  1.1598  XHODCOH  value  1(  1.057  =  1977$ 

(Factors  provided  by  Hq  ATC/Management  Analysis 

c.  6-hr  day  (Baseline) 

d.  8-hr  day 
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course  hours  over  the  baseline  course  (the  course  as  it  actually 
operated  on  a  6-hr  day  and  for  527  course  hours).   The  modest  reductions 
of  $80.3K  (4.3%)  for  alternative  7  and  $173. OK  (9.21%)  for  alternative  8 
were  obtained  by  entering  students  every  36  hours  instead  of  every  40  hours 
as  In  the  baseline,  and  by  decreasing  the  actual  numbers  of  students  enter- 
ing the  course.   Additionally,  students  were  washed  out  on  an  average  of 
217  hours  In  alternative  8,  as  opposed  to  an  average  of  240  hours  to  wash- 
out In  the  baseline.   The  reduction  In  alternative  7  was  due  largely  to 
reduced  student  pay,  reduced  student  PCS  costs  and  Instructor  PCS  costs. 

All  training  managers  felt  the  cost  information  provided  by  the  cost 
model  was  of  very  little  use  to  them.   All  sampled  managers  commented  that 
the  cost  model  Information  could  be  used  by  Hq  ATC  level  people  Involved 
in  making  decisions  about  policy  Impact  on  course  costs.   The  managers  in 
this  group  Indicated  that  the  cost  savings  shown  by  using  the  cost  model 
related  to  costs  not  managed  by  center  level  managers  (student  Instructor 
pay  and  PCS  costs).   These  obtained  cost  savings  were  In  areas  most 
directly  controlled  by  Hq  ATC  management  actions. 

It  Is  significant  that  the  two  money  saving  alternatives  (both  7  and  8) 
were  generated  Independently  of  any  simulation  —  I.e.,  neither  of  those 
two  alternatives  were  put  on  the  RUM.   This  fact  demonstrates  that  the 
cost  model  may  be  used  as  a  stand-alone  system,  but  the  overriding  question 
as  to  "who"  should  use  It  Is  addressed  later  in  this  report.    Because  of 
the  recommendations  generated  by  training  managers  In  this  group  as  well 
as  others,  the  cost  model  data  was  given  to  Hq  ATC  Comptroller  personnel 
In  the  management  analysis  section  for  further  study  and  comment 

3390  TTG  '  Results  of  Alternative  Costing. 

Table  2  presents  the  cost  of  the  alternatives  run  for  the  3ABR27630-000 
course.    In  this  case,  the  baseline  course     cost   generated  by  the  cost 
model  was  within  7%  of  the  ATC  Comptroller's  figures  for  the  cost  of  the 
basic  course  In  1977.    The  second  alternative  simply  represents  the  course 
cost  based  on  an  eight-hour  training  day  as  opposed  to  a  6-hour  training 
day.   The  resultant  savings  are  trivial  ($4.2K).    The  remaining  alternatives 
represent  various  ways  of  figuring  course  length  based  on  how  policy  dictated 
reductions  are  calculated.   While  the  cost  model  may  give  the  cheapest  alter- 
native (Alternative  2  In  this  case),  planners  still  needed  to  exercise 
Judgment.   The  vast  bulk  of  the  savings  generated  by  this  alternative  was 
In  pay  and  allowances  of  students.  Instructors,  and  base  permanent  party 
(support)  personnel.   These  costs,  while  Important,  are  not  meaningfully 
controlled  by  managers  at  the  training  group  level.   Additionally, Jhe 
cheapest  alternative  Is  not  always  the  best.    In  the  base  of  3ABR27630-000, 
the  Complementary  Technical  Training  (CTT)  is  vitally  Important  to  the 
course  of  training.   Alternative  4  presented  the  planners  with  the  best 
course  length  and  number  of  graduates  from  the  standpoint  of  meeting  train- 
ing standards.    It  Is  Important  to  note  that  the  planners  and  ma^iagers 
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TABLE  2 

COST  MODEL  BASELINE  COURSE  COST  COMPARISONS  WITH 
ALTERNATIVE  COURSE  COSTS  (3ABR27630-000)^ 


Alternative  Baseline  ^  1^  2  6        3^      4  9 

Data 


1 

1  • 

Annij;i1  Fntrv 

550 

550 

550 

550 

550 

2. 

Annual  Graduates 

471 

471 

490 

471 

481 

3. 

Annual  Failures 

11 

n 

n 

n 

n 

4. 

Avg  Course  Hours 

2i5 

COD 

215 

274 

5. 

Number  of  Instructors 

25 

25 

25 

25 

25 

6. 

Total  Course  Cost. 
(1977  dollars)  ^ 

1717.1 

1713.9 

1567.4 

1687.1 

1622, 

7. 

ATC  Course  Cost/Grad 

3.92178 

8. 

ATC  Total  Course  Cost 

1847.2 

9. 

Difference  (between  8  and  6) 

-130.1 

(7%) 

a  &  b  -   Same  as  other  tables 

c.  6-hr  day 

d.  8-hr  day 

e.  8-hr  day  (not   adding  CTT) 

f.  8-hr  day  course  +  (CTT  -  15%) 

g.  8-hr  day  (Course  +  CTT)  -  15% 
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TABLE  3 

COST  MODEL  BASELINE  COURSE  COSTS  COMPARISONS  WITH 

ALTERNATIVE  COURSE  COSTS  (3ABR30434)  a 

Alternative*  Baseline         1^  2^ 

Data 

1.  Annual  Entry  591  591  591 

2.  Annual  Graduates  402  424  435 

3.  Annual  Failures  25  26  26 

4.  Avg  Course  Hours  460  520  460 

5.  Nitnber  of  Instructors  73  73  73 

6.  Total  Course  Cost 

(1977  dollars)  ^  3280.2  3105.1  298F.9 

7.  ATC  Course  Cost/Graduate  8.46088 

8.  ATC  Total  Course  Cosi 

((7)  X  (2))  3401.2 

9.  Difference 

(Between  8  and  6)  -  121.00  (-3.5%) 


a.  In  thousands  of  dollars 

b.  Adjusted  by  a  factor  of  1.1598  X  MODCOM  value  X  1.057  -  1977  dollars 
(Provided  by  Hq  ATC/Management  Analysis) 

c.  8-hr  day  -  same  course  length 

d.  (Course  +  cn)  -  15« 
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already  knew  that  the  mix  represented  in  alternative  4  would  be  the 
best  option  for  them  to  plan  under  the  new  policies.    It  is  also  important 
to  emphasize  that  the  way  in  which  these  policy  decisions  would  be  imple- 
mented was  determined  independently  of  cost  model  information.    The  cost 
model  data  confirmed  what  managers  already  knew  about  the  effect  of  recent 
policy  decisions  on  course  costs. 

Both  managers  in  this  group  rated  the  cost  model  information  of  no 
use  at  all  to  them.    They  indicated  that  the  information  could  be  of 
use  to  Hq  ATC  Training  Managers  and  others  in  evaluating  the  cost  of 
current  course  training  and  in  evaluating  the  cost  of  alternative  course 
designs. 

3410  7TG  >•  Results  of  Alternative  Costing.    Table  3  contains  the  results 
of  alternative  costing  for  the  3ABR30434  course.    Only  two  alternatives  were 
run  through  the  cost  model  in  this  course  because  time  ran  out  for  the 
service  test.    A  problem  with  the  amount  of  time  necessary  to  gather  data 
for  input  into  MODCOM  (cost  model)  bears  examination  at  this  point.  This 
problem  occurred  with  gathering  data  to  input  into  all  three  baseline 
courses,  but  is  discussed  here  for  convenience.    All    Interface  team  members 
spent  a  great  deal  of  time  gathering  cost  information  and  putting  it  in  a 
form  usable  by  the  cost  model.    Specifically,  for  the  3ABR30434  course, 
30%  of  the  total  time  spent  by  the  interface  team  members  for  all  phases 
of  the  service     st  was  in  gathering  cost  data  for  MODCOM  (for  interface 
team  members  on  iABR32831  -  50%  and  3ABR27630  -  50%).    This  amount  of 
time  is  grossly  disproportional  when  one  considers  the  inability  of  managers 
to  use  the  final  cost  information.    In  any  event,  the  interface  team  felt 
that  entirely  too  much  time  was  spent  on  this  portion  of  the  service  test. 
The  mechanics  of  inputting  the  information  and  obtaining  final  products  was, 
on  the  other  hand,  extremely  easy  and  presented  no  problems  whatsoever  in 
terms  of  usage.    Once  the  baseline  cost  information  was  obtained,  it  was 
very  easy  to  generate  costs  for  alternative  course  designs. 

The  two  alternatives  presented  for  the  30434  course  show  a  roughly 
5%  saving  (alternative  1)  for  the  course  planned  on  a  straight  8-hour 
training  day  with  no  CTT  added;  and  a  roughly  9%  saving  when  planning  a 
course  length  with  CTT  added  and  reducing  the  resultant  course  hours  by 
^5%,    As  with  the  other  training  groups,  this  information  was  not  enight- 
ening  to  training  managers.    One  of  the  managers  found  the  cost  model 
information  very  useful  in  improving  course  manning  structure  and  student 
use  of  equipment.    Both  managers  felt  that  costs  were  generally  determined 
by  ATC  policy,  rather  than  managed  by  training  group-level  personnel. 

MODIA's  Usefulness  as  a  Problem-Solving  Tool  --  The  Resource  Utiliza- 
tion Model  Simulation.    Based  on  the  responses  to  questionnaires  [see 
Appendix)  given  to  the  training  managers  in  the  three  training  groups 
involved  in  the  Phase  II  service  test  and  on  interviews  with  group  and 
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center-level  management,  the  RUM  does  not  appear  to  be  a  useful  planning 
tool  that  can  be  used  by  these  managers  to  more  effects! vely  manage  train- 
ing resources. 

General.    As  mentioned  in  the  Methodology  section,  all  managers  were 
presented  with  completed  course  simulations  and  cost  model  data  for  all 
alternatives  —  all  managers  had  the  products  carefully  explained  to 
them  (8/9  of  the  managers  responded  that  they  had  the  products  explained 
well  enough  to  them  that  they  understood  all  the  products  from  the  RUM 
and  the  cost  model).    All  managers  were  facing  real  course  management 
problems  at  the  time  of  the  evaluation.    For  example,  all  had  to  revise 
courses  from  a  six  to  an  eight-hour  training  day,  all  had  instructor 
shortages,  and  all  had  students  awaiting  training.    The  RUM  simulation 
was  unable  to  provide  the  training  managers  with  unique  information  on 
course  operation.    From  their  responses  to  questions  1,  2,  9,  11,  12,  13 
14,  15,  16  and  17,  it  was  apparent  that  the  MODIA  simulation  was  not 
telling  the  managers  something  they  did  not  already  know  about  specific 
problems  1n  the  operation  of  the  courses  studied  in  this  service  test. 
It  appeared  from  responses  given  in  the  questionnaire  that  the  RUM  simula- 
tion was  generally  of  very  little  use  to  the  group  level  managers.  From 
responses  generated  during  debriefing  and  from  comments  on  the  questionnaires 
the  problems  concerning  the  managers  were  foreseen  without  the  aid  of  the 
simulation.    The  more  pressing  problems,  such  as  those  concerned  with  shift 
operation,  could  NOT  be  realistically  simulated  on  MODIA,    Seven  managers 
said  they  had  little  confidence  in  the  RUM  simulation,  and  all  nine  managers 
felt  they  had  foreseen  the  problems  in  course  operation  just  as  well  or 
better  than  the  simulation. 

At  this  point  it  is  of  interest  to  note  that  enthusiasm  for  MODIA  ran 
high  during  the  service  test  because  of  perceived  potential  of  the  system 
for  helping  managers  solve  some  of  the  problems  that  were  facing  them  at 
the  time.    However,  the  managers  expressed  frustration  with  the  MODIA 
system  when  they  could  not  use  it  to  help  them  manage  those  problems. 
For  example,  one  branch  chief  would  have  liked  a  system  which  would  allow 
him  to  strategically  pull  Instructors  to  support  Air  Force  exercises  and 
still  optimally  operate  the  course. 

At  the  conclusion  of  the  service  test,  several  managers  expressed 
the  need  for  a  computerized  system  which  would  help  them  with  scheduling 
problems,  and/or  a  system  which  would  optimize  the  use  of  certain  resources. 

It  was  explained  to  the  managers  that  MODIA  was  neither  a  scheduling  nor  

an  optimizing  tool.    One  has  to  use  the  RUM  simulation   to  test  the  feasi- 
bility of  the  given  design  the  planner  brings  to  the  system.    Other  systems, 
sqch  as  the  Advanced  Instructional  System  (AIS),  could  be  used  to  resolve 
the  scheduling  and  optimizing  problems  which  seem  to  represent  the  more 
important  management  problems  facing  course  managers   at  the  center  and 
group  level. 
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The  results  of  the  managers'  opinions  about  the  simulation  differed 
slightly  from  the  opinions  of  the  ISD  participants  and  from  the  judgments 
of  the  interface  team  members.    Two^-thirds  v?  che  ISD  people  felt  the  RUM 
simulation  would  be  of  little  use  in  course  planning,  while  two-thirds  of 
the  Interface  team  members  felt  it  would  be  useful.    Both  ISD  respondents 
and  Interface  team  members  were  confident  in  the  simulations  of  their 
respective  courses.    Of  the  six  respondents  (3  ISD  people  and  3  Interface 
team  members),  three  said  they  would  seldom  use  MODIA  were  it  to  become  a 
fully  operational  system,  and  three  said  they  would  use  the  system  often. 
This  result  compares  to  the  opinions  of  the  training  managers,  where  five 
of  nine  said  they  would  at  least  use  the  system  "sometime"  if  it  were  fully 
operational . 

In  summary,  the  opinions  of  those  involved  with  the  Phase  II  service 
test  found  the  simulation  lacking  in  certain  respects.    In  general,  there 
were  mixed  feelings  about  the  usefulness  of  the  RUM  simulation.  The 
managers  felt  that  the  simulation  was  of  little  value  to  them,  but  the 
ISD  and  interface  team  members  were  of  the  mixed  opinion  that  perhaps 
there  was  some  value  to  be  had  in  the  way  MODIA  simulated  course  operation, 
All  individuals  sampled  with  the  questionnaire  felt  that  the  simulation 
of  resource  use  was  less  than  totally  realistic  and  those  most  closely 
involved  with  MODIA  expressed  serious  reservations  about  the  restrictive 
limits  on  the  number  of  resources  that  could  be  planned  using  MODIA. 
A  summary  of  the  training  managers'  responses  in  each  training  group  is 
presented  below. 

The  3380  TTG  -  3ABR32831 .    The  interface  and  ISD  participants  for 
this  training  group  had  some  difficulty  packaging  resources  for  this  long 
duration  and  high-flow  course.    For  example,  there  were  80  Important 
resources  that  could  not  be  broken  out  as  desired  in  the  basic  course 
simulation.    In  fact,  the  course  structure  as  it  existed  in  actual  operas 
tion  could  not  be  accurately  depicted.    3ABR32831  could  not  be  depicted 
as  progressing  from  group  lock-step,  to  a  self-paced  portion,  then  back 
to  a  group  lock-step  again.    The  specific  problem  facing  the  simulation 
of  3ABR32831  was  that  the  students  had  to  be  returned  to  that  portion  of 
the  self'-paced  block  from  which  they  were  taken  in  order  to  complete  the 
last  group-lock-step  block.    The  configuration  that  was  simulated  had  a 
self-paced  portion  at  the  end  of  the  two  lock-step  portions  with  students 
arriving  In  random  intervals.    The  training  managers  who  examined  the 
simulation  felt  that  such  a  simulation  was  of  very  little  use  or  no  use 
at  all  to  them  --  5/5  responses  were  in  this  category;  4/5  of  the  managers 
in  this  training  group  felt  the  simulation  and  cost  model  information  were 
of  little  value;  and  3/5  had  little  confidence  in  the  final  simulation. 

The  3390  TTG  -  3ABR27630  Course.    The  two  training  managers  in  the 
3390  Technical  Training  Group  found  MODIA  to  be  more  useful  than  did  the 
managers  in  the  other  two  training  groups.    Both  felt  MODIA  would  be  useful 
in  helping  them  manage  course  revisions  better,  both  thought  the  system 
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had  value  to  them  as  managers,  and  both  were  very  confident  in  the 
results.   The  difference  in  the  response  of  this  group  and  the  others 
can  be  attributed  to  the  fact  that  MODIA  simulation  helped  the  managers 
spot  a  queuing  problem  that  existed  in  the  27630  course  operation.  While 
the  managers  knew  a  problem  of  some  kind  existed,  it  seems  MODIA  high- 
lighted a  possible  solution  which  was  subsequently  put  into  operation 
the  queuing  problem  was  solved. 

While  MODIA  generally  provided  favorable  results  in  this  training 
group,  several  coimients  by  the  managers  are  important  in  assessing 
MODlA's  usefulness.    First  MODIA  could  not  adequately  simulate  the  group- 
paced  operation  of  the  course.    MODIA,  Tiowever,  can  be  manipulated  to 
handle  group-paced  instruction,  but  in  the  present  case  the  options  avail- 
able for  simulating  group-paced  instruction  were  not  acceptable  to  training 
managers.    The  managers  were  not  satisfied  with  the  way  the  resultant 
course  "looked"  in  the  simulation.    Second,  MODIA  didn't  allow  the  managers 
to  more  effectively  manage  resources.   As  the  branch  chief  remarked.  In 
Its  present  form,  the  only  useful  purpose  It  serves  Is  to  highlight  the 
facility  costs  in  one  single  document."   The  most  pressing  problems  facing 
these  managers  were  instructor  manning  shortages.    They  felt  It  would  be 
futile  to  exercise  a  system  that  merely  highlighted  the  manning  problems 
they  were  aware  of  already. 

The  3410  TTG  -  3ABR30434.    The  most  predominant  remarks  made  by  managers 
in  thTs  group  dealt  with  the  limitations  of  the  MODIA  system.    Both  managers 
felt  that  MODIA  In  its  present  form  would  be  of  little  value  and  they  had 
little  confidence  in  the  simulation.    They  both  said  that  they  would  use 
MODIA  often,  if  it  were  substantially  changed. 

Organizational  Configuration.  Operational  Procedures,  and  Resources 
ReouTFedTtOmplement  MODIA.    As  stated  in  the  MODIA  Evaluation  Plan  for 
the  fhase  H  service  test,  the  determination  of  how  MODIA  should  be  used 
largely  depended  on  how  well  the  planners  and  managers  felt  they  could  use 
the  simulation  and  cost  information.    The  results  of  the  two  Pi^eceding  sec- 
tions indicate  that  the  RUM  simulation  of  course  operation  was  of  too  limited 
a  scope  to  be  of  any  value  in  the  planning  and  management  )f  course  operation 
at  the  center,  group  or  branch  levels. 

All  personnel  involved  in  the  service  test  were  queried  as  to  how 
and  where  MODIA  should  be  used  if  it  were  adopted.    There  was  a  wide  range 
In  the  recommendations  as  to  who  should  use  MODIA.    Some  managers  felt  that 
only  branch  level  planners  should  use  the  system,  while  most  others  recom- 
mended use  by  everyone  involved  in  the  planning  process,  from  branch  level 
to  Hq  ATC  course  managers  and  manpower  personnel.    Many  managers  stated 
they  would  not  recommend  the  system  as  it  exists  now,  but  stated  that  they 
could  use  a  MODIA-like  system.    A  particular  surprise  was  the  suggestion 
by  many  that  Hq  ATC  level  personnel  could  use  cost  model  Information. 
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This  recommendation  was  surprising  in  light  of  the  fact  that  at  the 
beginning  of  the  service  test,  managers  and  other  training  center  person- 
nel expressed  a  fear  that  MODIA  would  be  used  by  headquarters  to  Impose 
unrealistic  course  policy  changes  on  them.    That  training  managers  thought 
enough  of  the  cost  model  to  recommend  its  use  by  Hq  ATC  planners  speaks ^ 
well  for  the  cost  model.    Again,  though,  almost  all  the  personnel  question^' 
ed  did  emphasize  that  the  simulation  could  not  be  used  by  them  unless  it 
was  cons1de?*ably  changed. 

The  comments  and  responses  about  the  best  organizational  configuration 
were  clear  In  the  recommendations  that  one  centrally  located  Interface  team 
could  handle  the  planning  of  all  Type  3  courses  at  Keesler.   By  far  the 
most  frequent  recommendation  was  that  a  well-trained  interface  team  composed 
of  only  3  members  could  handle  all  the  necessary  planninq. 

In  addressing  the  questions  of  resources  required  to  implement  MODIA> 
the  results  of  this  service  test  have  several  clear  Implications,   As  far 
as  the  manpower  required  to  operate  the  system,  the  results  of  this  evalu- 
ation Indicate  that  very  little  manpower  Increases  would  be  needed  to 
operate  the  system  effectively.    This  result  is  consistent  with  the  most 
prevalent  recommendation  in  this  service  test  —  that  MODIA  be  operated 
by  a  centrally  located  team  of  about  3  individuals.    The  unexpectedly  low 
cost  of  this  service  test  was  achieved  for  a  variety  of  reasons,  dealt 
with  fully  in  the  discussion  section,  but  In  general  the  results  Indicate 
much  fewer  manhours  involved  In  operating  the  system  than  may  have  been 
estimated  based  on  Phase  I  results. 

In  contrast  to  the  small  manpower  increases  that  would  be  required 
to  implement  MODIA,  the  results  of  the  service  test  Indicate  considerable 
expenditures  in  computer  resources  would  be  required  to  Implement  the 
MODIA  system. 

The  severe  interference  with  training  caused  by  operating  the  UI  and 
the  resultant  restructlon  In  operating  hours  for  the  service  test  indicate 
that  MODIA  as  it  currently  is  written  could  NOT  be  used  for  eight  hours 
a  day  without  causing  unacceptable  impairment  of  other  training  being 
conducted  on  the  H-6060.    There  appears  to  be  little  possibility  of  using 
MODIA  on  the  B-3500  system,  since  Hq  ATC/ACD  has  gone  on  record  as  stating 
the  B-3500  system  is  currently  saturated.    The  computer  personnel  at 
Keesler  felt  MODIA  could  not  under  any  circumstances  be  used  in  its  pre- 
sent form,  since  existing  computer  resources  and  current  training  priorities 
leave  little  room  for  a  system  as  large  as  MODIA.    The  unacceptabllity  of 
MODIA  "as  It  currently  exists"  1s  a  consistent  theme  that  runs  through 
the  comments  of  all  those  involved  with  the  service  test.    The  recommenda- 
tions advanced  for  making  MODIA  more  acceptable  and  usable  are  discussed 
In  the  next  section. 
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Necessary  Changes  to  Improve  MODIA  Effectiveness,    One  of  the  standout 
results  of  this  evaluation  was  that  MODI A  would  have  to  be  dramatically 
changed  if  it  were  to  be  an  effective  planning  tool.   Far  and  away  the 
most  prevalent  recommendation  for  change  in  the  RUM  was  that  the  limit  on 
the  number  of  training  resources  be  considerably  increased.    The  current 
limit  of  30  resources  is  Just  not  adequate.    All  courses  planned  in  this 
service  test,  as  well  as  one  of  the  courses  in  the  Phase  I  evaluation,  had 
difficulty  working  around  this  limitation.    The  magnitude  of  the  problem 
created  by  a  limit  of  30  resources  is  highlighted  when  one  considers  that 
the  average  number  of  resources  used  in  most  courses  is  considerably  larger 
than  30  (and  can  go  as  high  as  1100  resources  in  one  particular  course). 

In  addition  to  increasing  the  resource  limitations  on  the  UI  and  RUM 
portions  of  MODIA,  it  is  necessary  to  decrease  the  overall  size  of  the  UI 
program,  especially  the  "C"  phase  of  the  UI,    This  phase  requires  70K  bytes 
of  storage  in  a  tlnie-sharing  system  with  11  OK  bytes  available  for  users. 
The  large  portion  of  the  time-sharing  system  required  by  this  phase  causes 
unacceptable  interference  with  other  users  of  the  time-sharing  system. 
This  particular  recommendation  for  reduction  in  the  size  of  the  UI  is  a 
result  which  was  also  obtained  in  the  Phase  I  service  test. 

At  this  point  it  must  be  mentioned  that  while  the  UI  could  be  re- 
written to  be  more  efficient  and  still  handle  the  recommended  size  increases 
discussed  below,  the  resultant  size  increases  in  the  operation  of  the  RUM 
would  probably  prove  unacceptable.    Since  the  limits  that  apply  to  the  UI 
directly  affect  the  amount  of  storage  required  by  the  RUM,  increases  in 
the  limits  allowed  on  the  UI  would  greatly  increase  the  amount  of  computer 
time  and  core  storage  demanded  by  the  RUM. 

In  relation  to  the  problem  of  the  overall  size  of  the  UI  program 
(especially  the  'R^  and  'C^  phases),  the  interface  team  recommended  that 
provision  be  made  in  the  program  to  enter  a  given  phase  at  particular 
points  during  the  phase.    As  the  programs  are  currently  written,  the  user 
must  enter  a  phase  at  the  beginning  if  he/she  is  to  make  a  change  and 
the  user  must  go  through  the  entire  phase  and  reenter  all  subsequent 
phases.    As  a  consequence,  a  considerable  amount  of  time  can  be  expended 
for  relatively  minor  changes  in  the  resource  assignments  or  capacities. 
Aside  from  having  to  reenter  all  subsequent  information  and  the  amount 
of  time  and  effort  involved,  having  to  tie  up  the  computer  for  relatively 
minor  changes  impairs  the  cost  effectiveness  of  the  UI. 

While  shift  operations  can  be  planned  on  MODIA  as  it  currently  exists, 
the  programs  should  be  rewritten  so  as  to  allow  more  direct  simulation 
of  the  shift  operation.    Current  options  on  MODIA  necessitate  manipulation 
of  the  MODIA  system  in  such  a  way  as  to  make  simulation  of  shift  operations 
unrealistic  and  unacceptable  to  training  managers. 

Apparent  form  the  garbling  of  certain  numbers  and  letter  co:nbin^v/ons 
and  the  large  size  of  MODIA  programs  was  a  certain  amount  of  incompatibility 
between  the  MODIA  software  and  the  Honeywell  system.    MODIA  programs  currently 
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cannot  be  rewritten  to  alleviate  this  basic  incompatibility,  but  should 
MODI A  eventually  be  adopted,  the  garbling  problem  would  have  to  be 
resolved. 


DISCUSSION 

General.    The  Phase  II  evaluation  effort  differed  from  Phase  I  in  that 
MODIA  planning  was  attempted  in  an  operating  environment  with  planners 
and  managers  judging  and  considering  MODIA  simulation  and  cost  information 
in  the  light  of  current  policy  guidelines  and  resource  constraints.  The 
results  of  their  judgments  and  the  operating  experiences  of  this  service 
test  indicate  that  the  simulation  has  very  limited  value  for  the  management 
of  technical  training  courses  at  the  branch,  group,  and  center  level.  This 
result  is  at  apparent  odds  with  Phase  I  findings  which  seem  to  indicate 
MODIA  had  potential  for  helping  management  design  more  cost^-effective 
courses. 

In  the  Phase  I  effort,  however,  the  primary  concern  was  determining 
whether  the  simulations  of  course  operations  were  "valid'',  I.e.,  could 
they  realistically  simulate  the  way  courses  actually  operated;  and  whether 
the  cost  of  MODIA  could  be  offset  by  more  cost-effective  designs  —  as 
compared  with  conventionally  designed  courses.    Phase  II  mainly  tried  to 
determine  how  course  managers  might  use  the  system.    The  results  clearly 
indicate  that  the  simulation  and  cost  information  was  of  little  use  to 
managers  at  this  level. 

No  attempt  was  made  to  compare  MODIA  planning  with  conventional  plan- 
ning.   Such  comparisons  are  a  little  like  comparing  apples  and  oranges. 
Were  MODIA  to  be  radically  changed,  the  interface  team  and  2  of  3  ISD 
planners  felt  the  system  could  be  an  important  aid  in  organizing  and 
clarifying  the  planning  process.    In  short,  MODIA  could  have  been  an 
important  addition  to  the  planning  process,  but  limitations  in  the  system 
as  it  currently  exists  precluded  managers  from  seeing  it  as  a  positive, 
useful  tool.    Specific  highlights  of  the  results  will  be  discussed  in 
relation  to  Phase  I  results  and  in  relation  to  MODIA' s  potential  for 
implementation  now  and  in  the  future, 

MODIA  Costs.    One  of  the  most  striking  differences  between  Phf^se  I 
and  Phase  II  service  tests  was  the  costs  of  implementing  and  operating  the 
MODIA  system.    The  main  difference  In  the  cost  of  the  two  service  tests 
was  in  the  reduced  manpower  required  to  operate  MODIA.    This  reduction 
reflects  the  high  level  of  competence  achieved  by  the  interface  team. 
The  experiences  of  Phase  I  seemed  to  have  paid  some  dividends  for  the 
Phase  II  effort.    Particular  mention  must  be  made  of  the  quality  of  train- 
ing of  the  interface  team.    The  interface  team  members,  as  well  as  the 
project  officer,  were  very  adept  at  working  around  problems  and  seemed  to 
know  a  great  deal  more  about  how  to  use  the  system  than  may  have  been  the 
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case  In  Phase  I,   All  four  individuals  expressed  complete  satisfaction  with 
the  thoroughness  of  the  training  received  at  Rand  Corporation  in  Santa 
Monica.    The  monies  expended  for  this  training  undoubtedly  paid  big  dividends 
both  In  the  reduced  cost  of  operating  the  system  and  in  the  quality  of  the 
generated  products. 

Just  how  much  this  experience  of  the  service  test  would  affect  the 
per  hour  cost  of  MODIA  planning  figures  In  the  Phase  I  effort  is  difficult 
to  guess  exactly  since  this  service  test  did  not  address  the  cost  of  MODIA 
planning  vis-a-vis  conventional  planning.    It  Is  safe  to  say  MODIA  Planning 
would  not  be  as  expensive  as  indicated  in  the  Phase  I  test.   The  carefully 
kept  worlTlogs  and  the  comments  by  virtually  all  participants  (Interface 
Team,  ISD,  and  Training  Managers)  indicated  MODIA  planning  would  not  unduly 
complicate  the  planning  process.    They  felt  the  simulation  would  be  a  useful 
aid  In  course  planning  were  its  limitations  corrected  to  allow  more  realistic 
representation  of  resource  use  and  more  realistic  simulation  of  course  shift 
operations.    The  limitation  of  the  UI  and  the  RUM  products  appears  to  be  the 
main  factor  mitigating  against  MODIA  effectiveness. 

MODIA  System  Limitations.   A  look  at  the  recommended  changes  to  MODIA 
prov'i'des  the  reader  with  a  base  from  which  to  judge  the  limitations  of  the 
system.    The  standout  1  imitation  is  the  restricted  number  of  resources  that 
can  be  assigned  to  learning  events.    This  limitation  probably  reduced  the 
acceptabl'iity  of  the  simulation  to  the  majority  of  the  training  managers » 
Again,  this  finding  is  unehlightening  to  some  extent  since  Phase  I  findings 
indicated  that  the  limitations  degraded  its  acceptability  to  course  planners. 
Phase  II  results  showed  the  limitations  degraded  the  system  acceptability 
to  managers  as  well . 

In  addition  to  the  resource  limits,  the  interface  team  reiterated  the 
desire  to  have  a  user  Interface  that  would  be  more  adaptable.    They  indicated 
a  need  for  a  system  which  cc  -Id  be  entered  at  more  points  in  a  given  Phase, 
and  which  could  accept  changes  within  a  phase  without  having  to  reenter  all 
subsequent  information  within  a  file. 

Another  major  factor  influencing  acceptability  of  MODIA  was  the  In- 
ability of  planners  to  realistically  simulate  courses  with  shift  operations, 
and  courses  with  certain  configurations,    3ABR32831  had  certain  portions 
of  the  course  where  students  progressed  from  lock-step  through  self rpaced 
instruction  and  then  back  to  lockr^step  again.   The  course  could  only  be 
slmuUted  with  the  self-paced  portion  at  the  end  of  the  two  lock-step  blocks. 
While  thts  resolved  the  problem  of  simulating  the  course  with   MODIA,  It 
made  the  resultant  "picture"  of  course  operation  unrealistic.   The  problem 
of  simulating  courses  with  shift  operations  was  more  serious.  Managers 
expressed  a  definite  need  for  accurately  simulating  this  type  of  operation, 
but  the  MODIA  system  was  not  designed  to  handle  shift  operations  In  a  way 
that  would  be  useful  to  course  managers. 
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In  planning  3ABR27630,  the  managers  were  chagrined  by  MODIA's 
inability  to  simulate  group-paced  instruction.    The  course  was  simulated 
using  the  lock-step  option,  but  again,  the  resultant  simulation  was  some- 
what unrealistic  and  the  training  managers  expressed  their  dissatisfaction 
with  the  resulting  product* 

In  sum,  while  some  realistic  si*nulation  of  the  three  courses  was 
achieved,  managers  felt  that  inherent  limitations  of  the  MODIA  system 
prevented  the  simulation  from  being  of  any  value  to  them  in  managing 
training  courses.    A  majority  of  the  managers  liked  the  MODIA  concept ,  but 
wanted  the  system  to  do  more  than  it  was  designed  to  do. 

As  mentioned  earlier,  there  are  other  scheduling  and  optimizing  models 
such  as  AIS,  which  better  handle  the  problems  facino  course  managers, 

A  qu<}stion  naturally  arises  about  the  relatively  small  numbers  of 
managers  exposed  to  the  MODIA  simulation  in  this  service  test.    From  a 
rigorous  standpoint,  it  would  be  unwise  to  generalize  about  the  value  of 
MODIA  as  a  management  tool  based  on  the  judgments  of  only  nine  course 
manan*        These  nine  were  sampled  in  this  service  test  because  they  were 
most    erectly  Involved  in  the  management  of  the  courses  selected  for  the 
service  test  and  in  the  best  position  to  judge  the  utility  and  accuracy  of 
the  MODIA  simulation  and  cost  model  information.    Additionally, '  the  large 
reorganization  of  the  former  School  of  Applied  Aerospace  Sciences  under  the 
training  center  caused  other  managers  who  would  otherwise  be  involved  to  be 
shifted  to  other  organizational  positions.    Only  those  managers  who  could 
best  judge  the  accuracy  and  usefulness  were  asked  to  comment  on  the  system. 

Arguing  for  the  general izability  of  the  training  managers'  judgments 
is  the  fact  tHit  the  perceived  limitations  of  the  system  were,  by  and  large, 
the  same  limitations  uncovered  in  the  Phase  I  service  test  by  course  planners. 
The  fact  that  these  limitations  were  also  judged  by  the  training  managers 
as  constraining  the  usefulness  of  the  simulation  to  management,  provides  a 
reasonable  clue  as  to  the  value  of  MODIA  simulation  to  others.    It  can  be 
argued  here  that  if  the  managers  most  familiar  with  course  operation  could 
not  find  the  simulation  useful  or  acceptable,  no  one  else  could  either. 

The  Cost  Model,    In  general,  the  training  management  found  the  costing 
of  course  alternatives  very  interesting  but  of  little  value  to  them  in 
management  of  course  operation  or  in  planning  revised  courses.    The  large 
majority  of  managers  stated  that  the  cost  model  information  was  of  little 
practical  value  at  their  level,  but  went  so  far  as  to  suggest  the  use  of 
cost  model  information  by  Hq  ATC  level  management.    The  obtained  accuracy 
of  the  cDst  model  figures  and  the  ease  with  which  alternative  course  costs 
could  be  generated  argued  strongly  for  its  adoption  at  someTevel,  The 
inputs  to  the  cost  model  portion  of  MODIA  can  be  obtained  independently  of 
any  information  provided  by  the  simulation  portion  of  the  MODIA  system.  In 
short,  from  the  results  of  this  evaluation,  MODCOM  could  be  useful  for  Hq  ATC 
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level  planners  and  managers.    Based  on  the  service  test  experience,  however, 
certain  caveats  have  to  be  issued  regarding  the  time  involved  to  gather 
and  format  information  for  input  into  the  cost  model. 

The  input  information  for  use  by  MODCOM  took  a  long  time  to  gather 
and  put  into  a  usable  form.    Regardless  of  who  uses  the  product,  input 
undoubtedly  would  have  to  come  from  branch  and  group  level  planners,  and 
would  have  to  be  updated  regularly  by  the  same  people.    This  effort  would 
naturally  extend  to  all  Type  3  courses  at  each  of  the  training  centers  and 
would  involve  substantial  changes  in  the  way  maintenance  and  cost  data  on 
training  course  resources  are  kept.    Would  the  effort  be  worth  it?  In 
order  to  get  a  feel  for  the  utility  of  the  information  provided  by  the  cost 
model,  the  cost  model  information  on  each  of  the  three  basic  courses  and 
for  all  alternatives  examined  in  this  service  test  were  given  to  planners 
in  the  Management  Analysis  Directorate  of  the  Hq  ATC  Comptroller.  The 
results  of  their  study  of  MODCOM  indicated  that  the  Cost  Model  would  probably 
not  be  of  any  use  at  the  Hq  ATC  level. 

Despite  the  opinion  of  group  and  branch  level  management  that  MODCOM 
information  was  interesting,  but  of  little  value,  it  could  be  argued  that 
these  managers  should  be  using  this  information  regardless  of  current 
practice.    The  argument  may  go  that  just  because  they  are  not  used  to 
considering  costs  of  alternative  course  designs,  they  could  or  should 
make  such  considerations  when  they  plan  "or  revise  courses.    This  position 
involves  the  determination  of  "what"  or  "who"  drives  course  costs.  The 
managers  sampled  in  this  exercise  felt  that  training  philosophy,  as  presented 
in  center-level  and  headquarters-level  policy  guidelines  (as  well  as  practical 
considerations  like  the  Trained  Personnel  Requirement)  drove  the  bulk  of  the 
costs  of  training.    As  it  turns  out,  current  systems  for  managing  course 
costs  are  adequate,  and  the  cost  model  information  would  not  add  anything 
to  current  management  of  course  costs. 

In  sum,  the  major  conclusion  of  the  evaluation  study  is  that  MODIA 
could  not  be  of  sufficient  practical  value  to  managers  at  the  branch, 
group,  or  center  level.    MODIA  was  designed  as  a  research  tool  to  answer 
broad,  "what  if"  types  of  questions  and  probably  should  not  be  modified 
to  provide  the  more  detailed  level  of  simulation  required  by  these  planners 
and  managers.    In  addition  to  the  simulation  output  from  MODIA,  the  Cost 
Model  information  proved  to  be  of  little  value  to  the  planners  and  m.t:n- 
agers  at  this  organizational  level  for  similar  reasons.  Specifically, 
these  Individuals  did  not  manage  those  costs  which  represented  the  largest 
part  of  the  variable  costs  associated  with  technical  training. 

It  could  be  argued  that  planners  and  managers  at  higher  organizational 
levels  should  routinely  use  the  simulation  and  cost  model  information  to 
examine  the  Impact  of  broad  policy  decisions  on  course  operation,  but  such 
usage  has  inherent  limitations.    Someone  would  still  have  to  providfi  the 
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baseline  course  data  and  keep  it  up  on  all  courses  of  interest,  and  from 
the  experience  of  this  service  test,  this  would  be  no  small  chore. 

A  more  realistic  use  of  the  MODIA  system  probably  involves  using  the 
system  for  what  it  was  designed  to  do  best  —  answer  broad  research 
questions  such  as:    (1)    What  are  the  effects  of  varying  student  ability  on 
course  design?    (2)   When  is  self-pacing  best?   ^  r  with  what  types  of 
courses?    (3)    What  1s  the  interaction  between  student  ability  and  types 
of  training  and  course  cost?    (4)  How  to  best  group  student  training  on 
expensive  equipment?   MODIA  may  allow  researchers  to  approach  these  and 
similar  questions  without  extensive  and  expensive  (and  often  equivocal) 
field  studies. 
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KnC  COST  INCURRED  TO  IMPLEMENT/OPERATE 
MODIA  PHASE  II  SERVICE  TEST 


CATEGORY 


FACTOR 


RATE 


COST 


1 


I.  IMPLEMENTATION  COSTS'' 

A.  Computer  Tennlnal  Time 

B.  Other  Projects  riin  on 
MODIA  Systems 

C.  Communications  line 
Installation  costs 
(installation/removal ) 

D.  TOY  Costs 

1.  Interface  Team  Train- 
ing. 

2.  Lackland  TOY 

E.  MODEM  Costs 
;  SUB  TOTAL 

II.  OPERATING  COSTS 

A.  3ABR32831 

1.  Computer  terminal 
time 

2.  Personnel  usage 

1  E-6  (Interface) 

2  E-5  (ISO) 

3  E-5  " 

4  E-4  " 

5  GS-9  " 

Sub  Total 

B.  3ABR30434  (all  three 
courses) 

1.  Computer  Terminal 
time 

2.  Personnel  Usage 

a,  E-7 

b.  GS-9 

Sub  Total 

Rounded  to  nearest  dollar. 

♦Excludes  cost  of  Rand 
personnel  usage  In 
setting  up  MODIA. 


16.53  hr 
15.4  hrs 


5  month 


44.8  hr 

178 
47 
97 
22 

22.5 


16.5  hrs 


133.25 
108.5 


$220.00/hr 
220.00/hr 


58.00/Mo 


220.00/hr 

6,.;-:.'n.^ 

5.39/hr 
4.65/hr 
9.43/hr 


220.00/hr 


7.55/V 
9.43/' < 


$3737.00 
3391.00 

200.00 

3194.00 

262.00 

290.00, 
11,074.00^ 


9856.00 

184.45 
253.33 
522.00 
102.00 
cl2.24 

11  . 130.001 


3.630.00 


1.006.00 
1.023.00 


5.659.00 


1 


i  .-5  ^ 
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C.    3ABR27630  (All  three 
courses} 

1.  Computer  terminal 

,  50,3  hrs         220.00/hr  $11,066,00 

2.  Personnel  Usage 

a.  GS-n  (Interface)  33.25  11.41/hr  379.38 

b.  GS-9  (ISO)  15.5  9.43/hr  146,17 

Sub  Total  $11,592.00^ 

OTHER  COSTS 

Personnel  Costs: 
Waiting  (Computer  Proglems  - 
Access,  etc.) 

I'l  7,00                7.55  $  52.85 

5-6  97.75                6.45  630.49 

6S-11  45.25               11.41  516.30 

SUBTOTAL  $1,199.00 
TOTALS 

Project  Offi  cer  (Keesler) 

Capt  (69  days  x  8  x  .6)      331.00              11.01          $  3.644.00 

Total  Computer  Terminal  Time  Costs  $31,680.00 

Total  Personnel  Costs  (Manpower  Time)  3.828.00 

Total  Implementation  3.946.00 

Total  Other  1.199.00 

TOTAL  COST  $44,297.00 


STORAGE  REQUlREMENTSl  FOR  VARIOUS  PORTIONS 
OF  THE  USER  INTERFACE 


PHASE 

I  42 

S  42 

P  46 

T  40 

R  58 

C  70 

RUM  48 

RUN  66 


iQn  H-6060  Time-sharing  System 


K  Bytes  of  Storage 
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INSTRUCTIONS  POR  THE 
TBMmUS  MANAGERS'  QUESTIONNAIRE 


1.  As  a  mnAgK?  of  varioue  a^»c±s  of  technical  training  oourae  opera- 

y=«  hflwa  bean  asked  to  participate  in  the  evaluation  of  the  MDDIA 
plsming  q^vtan.   You  are  in  a  position  to  maJce  a  niariDer  of  asseaanents 
aanoiming  the  iMBfuli^sa  of  MQDIA  products  to  the  effective  managoient 
Qg  ttaiiUngf  oouns  operations,  and  to  MDDIA' a  usefulness  to  the  training 
wnjjsr  sa  apLMning  tool.    In  the  following  pages,  you  will  find  a 
niater  of  ^adLfic  questions  oonoeming  your  experience  with  MX3IA  and 
cpi«iaM  «■  to  its  potential  usefulness.    The  judgments  required  of 
3^  •»  SRproKinite  in  nature,  but  please  exercise  thoughtful  consideration 
fcrjsid*  quMtion.   Only  sisnnary  statistical  results  of  your  respon«<^s 
oonintd  with  rsaponass  of  others  will  be  used  in  deciding  on  the  utility 
of  the  NtlDlA  syvtsn. 

2.  PlMse  read  each  question  carefully  and  indicate  your  response  on 
the  rating  soils  by  placing  a  check  nark  in  the  appztjpriate  space. 

itJin  you  are  flnldied,  make  sura  you  have  oonpleted  the  general  information 
■heet,   and  that  you  have  put  your  name  in  the  indicated  place.    Place  the 
aoi^«tid  qpiMitiflnnaire  in  the  envelope  provided  and  return  it  to  TK30T. 
Vmmttmr^vo  other  Technical  Training  Wing  personnel  will  see  your  re^xxises. 
The  infoar— ticn  %d.ll  be  analyzed  and  presented  in  suimtary  statistical  form 
IV  the  Dsdnology  Applications  Center.    If  you  have  additional  ocmnentS/ 
clarifloBtion  mA/cxc  explemationa,  regarding  ar^  particular  question, 
please  xoaks  thm  on  the  back  of  the  sheet  containing  the  question.  Please 
inUcate  the  question  nunber. 
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QUESnCMNAIRE  FOR  'H^AINING  MANAGERS 

1.  Hov  useful  wofi  the  MGDIA  simulation  to  you  as  a  training  inanaqer, 
in  spotting  potsntial  pedslmiB  in  oourae  operation? 

NO  USE  UHtt  LiVL'Ui  tOOmiTE  OONSmERABLY  hXTFOM^ 

XT  ML         UB  USEFUIKESS  USEFUL  USEFUL 

PImum  list  the  potential  problens  that  mqoia  allowed  you  to  spot. 

-  problems  In  instructor  mainning  1n  lab  situations 

-  Instructional  sequencing 

-  Number  of  required  classrooms 

-  No  problems  not  already  foreseen 

-  Student  bottlenecks 

2.  Mere  any  of  the  problems  depicted  in  the  MODIA  simulation  preplans 
you  would  have  foreseen  without  MDIA? 

8    Yes  1  No 


If  y»,Xir  response  Was  yes,  please  list  those  prdtjlatis  you  oould  havx- 
foreseen  without  MODIA  simulation  ajid  explain  hew  you  would  have  foreseen 
tisn? 

-  Manpower  Utilization 

-  Delays  in  student  progress 

-  Costs  of  Training 

-  Laboratory  Utilization 


lliicli  problem  would  you  have  been,  unable  to  foresee  without  the  MODIA 
simulation?    (Please  list) 

-  None 

-  All  problems  were  known  before  MODIA  simulation 

-All  of  the  problems  established  by  MODIA  programs  were  foreseen 
and  attacked  without  MODIA 
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3,  H3W  noaliatic  were  the  alternative  course  designs  provided  you  by 
your  ISD  team  mairiber? 


:   2:  lJj  lIj  lLj  :  ; 

TXyailX  SCJMEWHAT  MC3DERATELY  VERY  viVIDLy 

IKREALianC        HBMJSTIC  REALISTIC  REALISTIC  REALISTIC 


OGWDint  on  aspects  of  the  alternatives  which  you  feel  were  helpful 
or  unreallatic: 

-  The  alternati^'es  could  only  provide  single  shift  simulation  due  to  amount  of 

inputs  reqt  red. 

-  MODIA  was  lirnit^ad  because  suggested  alternatives  could  not  be  used  because  of 

the  250  learning  event  restriction. 

-  The  alternatives  that  were  used  indicated  the  results  that  were  anticipated. 

4.    Hew  nixh  tine  did  it  take  for  the  ISD  team  to  generate  alternative 
couTM  designs  for  you? 

:      :  lLj  lLj  :  ^  ;  ;  ; 

NO  TIME  MODEST  CONSIDERAEff^  AN  EXIItEME  

AT  ALL  LITTLE  AMCOWT  AMOUNT  CF  TIME 


5«  Hew  many  of  tJie  oourse  design  chaj  iqes  you  recamiended  were  the  ISD 
interface  team  members  able  to  incorporatre  into  the  alternative  course 
designs? 

:    2:  :    3,  ^  ^  ^  ^ 

N0^^  ^.'ER7  "SOK  mL 

FEW 


Do  you  have  any  additional  cxantents  on  the  alternatives  the  ISD  teams 
designed  on  MODIA  for  you? 


6.    Oould  you  understand  the  output  of  the  MODIA  simulation? 

8  -  Yes 
1  -  Not  at  all 
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Oanuentm? 


7.    Dld  *th«  ISD  team  matter  explain  the  output  to  you? 
8    Yes  1  No 


9.  Did  you  feel  that  you  oould  understand  the  simulated  course  operation 
after  It  we  esqplalned  to  you? 


8    Yes  _1  No 

Gonnants? 


9.    In  your  opinion,  would  the  ICDIA  simulation  of  course  operation 
and  oourae  ooet  enable  you  to  better  nonage  oourse  problems  and 
reeouxoes? 

:    1  J  :  3  :  :      ;   ;  3  ;  ;    2 ; 

VBV  MXS  SOMEWHAT  VEPY  NOT  AT 

SO  aenER  litfle  all 


QammntB? 

-  The  simulation  would  enable  us  to  find  bottlenecks  and  queuing  problems 

before  they  occurred.  It  would  be  extremely  valuable  as  a  course  planning 
tool  If  we  were  able  to  program  the  Inputs  for  a  two  shift  course. 

-  Program  needs  expanding  to  allow  other  management  factors  to  be  considered: 

I.e.,  class  schedules,  washback  related  problems  for  rescheduling,  etc. 
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10.    Ara  there  my  changes  you  viould  like  to  see  nade  in  either  the 
cxxarae  elwiltkn  or  ooBt  infonnation  that  vould  make  the  MODIA  system 
output  nnra  useful  to  you  as  a  course  manager?   If  there  are  any  changes, 
plMM  list  thai  aid  explain? 

-  Computer  time  needs  .to  be  increased. 

-  Limits  on  the  number  of  inputs  requires  increasing 

-  Output  needs  to  be  reorganized  by  higher  Hq  as  valid  topi  for  increasing 

or  decreasing  manning  and/or  facilities. 

-  Cost  data  was  very  difficult  gathering  and  validating. 

-  Increase  the  250  training  event  limitation 

-  Increase  type  of  learning  events,  teaching  formats  and  teaching  agents. 


U.  Hov  often  do  you  feel  you  ^f4Duld  use  MDDIA  v/ere  it  to  be  adopted  as 
a  fully  opcrtttloflnal  systen? 

:      :   ;  4  ;  ;  3  :  ;  2  ;  ;  :   

NEVER  SmXM  SGMErUffi  OFTEN  C^STWmiY 


12.  Overall,  hew  valuable  vould  MOOIA  be  to  you  in  planning  a  course 
revision? 

 i  ^    :  •  6  •  I  1  ;  ;  2  :  ;  ; 

NO  m51  LiniE  MODERATE      "     VALtlABI£  EXTREMEL'^' 

VALUE  VMUE  VALUABI£ 


Additional  ocmnents? 


13.    Hour  confident  are  you  in  the  results  of  the  simulation  of  oourse 
operation? 

:      :  .1   5  :  :  2  :  ;   2  ;  ;  ; 

NOT  VERY  MMERATELY  WPC£  EXTREMELY 

CXKPIDair  UTTLE  CONFIDENT  CONFIDENT  C30NFIDENT 

AT  ALL  Oa-^^  \UE3XE 


14 


30 
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14.  How  confident  axe  you  In  the  ooet  figures  shown  to  you  on  the  ^ 
ooune  Gosts  (including  the  alternative  course  designs)? 

:  2  t  ;  4  ;   :  1  ;  ;  g 


NOT  VETO  MDOmmy  VEFY  EJCiraHELY 

OQNPIDBir  LinUE  CONFlDENr  OCNFUSNT  OONPIDaw 

XT  ALL  QGNPIDENCE 


15.  How  ueeful  a  planning  tool  would  the  course  sinulation  be  to  you 
«•  a  training  Mnager? 

:      ;  :  7  :  ;      ;  ;  2  ;  :  : 

NO  USE  CP  VEHY  MXERATELY  VERY  EXTOEMELSf 

AT  AUi  LmT£  USE  USEFUL  USEFUL  USEFUL 


MODIA  did  not  tell  us  anything  we  didn't  already  know. 


16.  How  uteful  ymn  the  cost  information  on  the  alternative  course  designs 
to  you  as  a  Mnager? 

;    2t  :_5j  ;    1  ;  ;    T  ;  ;  t 

NO  USR  CP  MOOERAXELY  VERSf  EXTREMELY 

M  All.  LiniB  USE        USEFUL  USEFUL  USEFUL 

Oonnenta? 


J- it: 
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17.  Do  you  fael  the  simulation  of  alternative  designs  cculd  be 
valu«  to  you  aa  a  <»urse  manager?  «  o- 

5    Yes  4  No 

18.  m  }*iat  Mys  would  you  as©  infonration  provided  by  tha  oost  ntadel 


19.  Miat  m  the  oost  relationship  of  the  baseline  course  to  the 
aitttiiatlVM  you  aaked  the  ISD  team  matters  to  plan  on  MODIA? 


Alternative  1: 


3  =  No  response 


Alternative  2: 


3  =  No  response 


1 


1 


b. 


Much  nore  e^q^ensive 
More  expensive 
_c.    About  the  same 
Less  expensive 
much  less  ejq^ensive 

 Much  more  expensive 

j  ^b.    More  expensive 

3  c.    About  the  same 

Less  expensive 


e. 


 ^e.    Much  less  expensive 
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1  i  n 


ObonMtc  or  eaqplanfttlons? 


20.  Mm  the  altamative  course  designs  vrorkable  —  that  is,  did  they 
Gonfocm  to  Air  ^nlning  Oesamand  and  Itednical  Sctool  policy? 

1    a.    Gonpletely  vjorkable 

4    b.   Warlcable  with  minor  changes 


2_JP'    ScitBwhat  unworkable  -  major  changes  required 
2  «  No  response   d.   TOteLLly  unworkable 


21.  m  your  opninlon,  how  shcxild  MODIA  be  used  (a  short  sentence  or 
two)? 


22.    In  your  opinion,  who  should  use  JCDIA  (Specify  at  ©adi 

ox^anis&tloml  l«v»l,  i.e.,  training  eveduaticn,  plans,  operations 
^etc.    —  lEou  can  specify  NCNE  or  MORE  THAN  ONE}? 

Tednical  School  Personnel  (Centar  Level) :  4  

Tsdnlcal  Ifcaining  Groip  personnel:  _3  


Branch  Personnel:   3    course  planners/curricula 
KTC  panomel:_4  


2  -  No  one  should  use  It. 
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23.  Tor  Mcsfa  c^tgnlzatianal  ?i.0vel  you  checked,  pleaar.,  in  a  sentence 
or  tMO,  cj^plain       they  would  uae  MOOI\? 


ItahnioAl  Schoolx 


TKteiiaal  ftainlng  Group: 


Branch: 


Hq  APC: 


24.  Vtiat  do  you  tl-dnk  would  be  the  role  of  those  organizations  you 
indicated  in  using  MODIA? 


1371 


25«   PiMsa  list  any  aAtiHonal  OGomnta  you  care  to  naioe  about  your 
escaarinot  idth  lOOHA,  its  uaefulneas,  or  any  suggestions  you  may  have 
for  iapeovlng  the  system. 

-  Increase  program  limits  on  teaching  formats  and  agents. 

-  Increase  number  of  learning  events 

-  Improve  cost  model  to  permit  Insertion  of  other  course  cost.   One  weak 

area  encountered  Is  In  expendable  supplies.   Our  training  courses  use 
materials  that  are  costly  In  supporting  performance  training  In  the 
laboratories. 

-  The  system  must  be  expanded  to  be  worthwhile. 

-  The  programs  need  to  be  expanded. 


i 
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DISASSOCIATED  UTILITY  OF  MORIBUND  BRAINS 

by 

CDR  C.  F.  Meredith,  USCG 


Thank  you.  Captain  FERGUSON,  for  the  opportunity  to  present  my  paper  this  evening. 
After  the  paper  I  presented  last  year  in  San  Antonio,  I  was  classified  as  a  standard 
deviant.    Subsequently,  I  was  advised  to  get  closer  to  NORM;  the  only  NORM'S  I  know 
are  non-standard  deviants. 

The  major  thrust  of  my  research  has  been  in  the  area  of  disassociated  utility  of 
moribund  brains,  acronymically, 


DUMB 


In  reality  the  full  title  of  my  paper  is  Disassociated  Utility  of  Moribund  Brains 
in  Stratified  Higher  Intellectual  Technology. 

Unfortunately,  I  was  unable  to  locate  an  appropriate  acronym  in  the  U.  S.  Government 
Catalog  of  Standard  Acronyms  on  which  to  base  my  paper  and  subsequent  research, 
therefore,  I  have  reverted  to  the  symplistic  form  DUMB. 

Disassociated  Utility  of  Moribund  Brains  in  Stratified  Higher 
i  Intellectual  Technology 

This  study  degenerated  from  a  self-conceptualized  realization  that  the  parathetical 
base's  for  psychomotorial  and  congenital  evaluative  processes,  derived  from  !*eplica- 
tioiis  of  the  cause-defect  continuum  in  U.  S.  military  training  is,  in  itself,  a  pro- 
cess of  debilitating  obfuscatory  criterion-referenced  retrograde  directed  systemization 
which  has  as  its  prooitiary  conclusion  a  higher  order  of  lesser  inactivity  in  the  non- 
results-oriented  result  of  out-processing  of  huMan  resources,  or,  if  you  wil"!,  why 
so  many  military  trainees  are  revolting. 

To  encapsulate,  in  the  initiatory  process  of  learn-r-referenced  behavior  modification, 
symbolism  is  employed  in  varying  degrees  in  representative  relationships.  For 
example,  observe  this  series  of  symbols 


I  I  I  I  I  I 


Each  of  these  inter-related  digitally  displayed  symbols  have  a  cross-related  defini- 
tion, if  you  will,  an  object,  an  entity,  a  being  unified  essence  of  quantifiable 
quality.    In  laymen's  terminology^  apples  and  oranges.    Through  an  interactive 
process  involving  psycho-motor  applications,  these  symbols  can  be  interposed  and 
juxtaposed  in  a  variety  of  arrays  to  produce  a  specific  differential  resultant, 
terminally  speaking. 

I  shall  now  depict  In  graphic  form  through  an  Interaction  of  cylinder- form 
calcium-based  substance  and  a  vertically-oriented  green-hued  non-organic  slate 

Ipbject,  mis-termed  a  blackboard,  how  these  symbols  are  most  commonly  presented 

^0  the  learning  inputee:    1  +  1  = 


EKLC 
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The  substantive  nature  of  these  symbols  has  been  non-imperically  transformed.  Yet^^ 
and  herein  lies  the  crux  of  rny  considerations,  the  arrangement  of  these  data  has 
not  led  to  a  predicted  conclusion  and  if  we  co-locate  and  additional  non-relative 
symbol 

1    +   1    =  ? 

our  perception  also  communicates  a  significant  discertitude. 

My  research  to  date  has  led  only  to  a  preliminary  conclusion.    By  a  random  selection 
of  one  symbol  from  the  population  of  similar  data  and  applying  the  aforementioned 
methodology,  I  have  found  that  the  digital  array  can  approach  content  validity. 

1+1    =  2 

Traditionalists  in  our  field  have  supported  my  findings  (OG  4200  B.C.,  Einstein 
1909  A.D.).    On  the  other  hand,  those  who  have  subscribed  to  the  precepts  of 
stratified  higher  intellectual  technology  have  aritculated  interrogatism.    I  would 
be  remiss  In  this  paper  if  I  failed  to  replicate  the  differentiations.    But,  before 
I  graphically  display  the  argument  against  my  approach,  I  shall  reiterate  syntheti- 
cally my  self-propogated  fear  that  if  the  research  of  the  stratified  higher  intel- 
lectual technologists  reaches  an  unnatural  conclusion  which  is  the  usual  result, 
disassociated  utility  of  moribund  brains  (or  dumbness)  will  be  the  terminal 
orientation. 

Their  non-articulated  objection  in  sum  ostensibly  stems  from  a  perceived  non-  . 
utilization  of  inherantly  dichotomous  symbolism  leading  to  and  causatory  of  the 
disassociated  properties  of  my  partially  stratified  bias-oriented  selection  of  the^^ 
digital  data.    Their  methodology  suggests  the  elimination  of  the  chance-level  symbol 

? 

whereby  one  Is  restricted  to  the  imposition  of  only  one  additive  similar  symbol, 
and  further  suggests  the  selection  of  three  similar  symbols  thereby  resulting  in 
this  analogous  if  illogical  formulation: 

1    +   1    =  N 

In  conclusion,  I  am  gratified  to  state  that  my  research  reached  termination  in  the 
pre-data  gathering  stage  and  fortunately  will  not  be  published.    I  will  be  happy 
to  question  any  of  your  answers  after  the  conclusion  of  this  evening's  program. 
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REPORT  OF  STEERING  COMMITTEE 


and 


GENERAL  BUSINESS  MEETING  (1978) 


1.  The  Steering  Conmittee  recommended  and  the  membership  approved 
chan&es  to  the  by-laws  which  redefined  a  quorum  of  the  Steering 
Committee  and  instructed  the  Secretary  to  solicit  nominations  for 
the  Harry  H.  Greer  Award. 

2.  A  description  of  the  Harry  H.  Greer  Award  and  its  recipients 
will  be  appended  to  the  by-laws. 

3.  The  German  Armed  Forces  Association  and  the  German  Armed  Forces 
Psychological  Service  Research  Institute  were  accepted  as  primary 
members  of  the  Steering  Cormnittee. 

4.  A  list  of  the  primary  membership  of  the  Steering  Committee  will 
be  appended  to  the  by-laws. 

5.  The  coordinating  agencies  of  the  next  four  annual  conferences 
will  be: 


1979 


Naval  Personnel  Research  and  Development 
Center  (San  Diego) 


1980 


Canadian  Forces  Personnel  Applied  Research 
Unit  (Toronto) 


1981 


Army  Individual  Training  Evaluation 
Directorate  (Ft.  Eustis) 


1982 


Naval  Education  and  Training  Program 
Development  Center  (Pensacola) 


3:  -AWS  OF  THE  MILITARY  TESTING  /GSOCIATION* 


Article  I  -  Na»e 

-ft 

The         of  this  organization  shall  be  the  Milit%ii^y  Testing 
Association. 

Article  II  -  PurDOse 
The  piflfTpose  jf  this  Association  sriall  3^  to: 

A.  As«3eabl£^  represeyintatives  of  the  various  armed  ^^rvices  of  the 
United  States  anc  such  otjer  nations  as  mgW:  request  3  cdiscuss  and 
exchange  irieaE  cnarcerning  assessmeni:  G?f  m  rtary  ner?  yme^. 

B.  Steha»»^  stucy*  ard  discuss  th*?^  miss^ion,  arga  .  action,  operations, 
and  researofc  ^^-trvities  of'  te  varioai  ^.^sni  idieii  nr^«nitaLtions  engaged  in 
military  aersc^  assessm^wc. 

C.  ^  *^   iimr^yed  ft^rsOTtnel  assessment  throiigh  «pl oration  and 
presentai-aatr  rf  tot  ischrfc^yfes:  and  prcccedure  fo^  te^^^Horal  measureront, 
occupatieMi  iftntiys^s,  marr^jwier  analysiis,  simoicatiiw  ««gls»  training 
programs,  5t^)«f>^Qr  nietteflD' sunsy  and  fesdfcaor  ^y^^ssss. 

D.  '^ft  /)u  .'^cyerazior       the  efimange  rf  ass«s?nemt  procedures, 
technique^  a«    frt>  --msst^, 

E.  PrciJfr     the  asssnEit  of  rr'ttary  perscr.-  •  as  a  scientific 
adjunct  to  mt^t f -  military  personnel  nwagement  witnin  th^e  military  and 
profess  ioTii .  owpiunities. 

Article  III  -  Participation 
The  f    ii^nng  categories  shall  constitute  membership  within  the  MTA: 
A.    Primairy  Hwnbership. 

1.  -AVi     -tive  auty  milife^-  md  civilian  personnel  permanently 
assigned  to  m  agency  of  the  assocatel  armed  services  having  primary 
responsibility  fo^  assessment  for  persTonnel  systems. 

2.  AH  c±vilian  and  artivt^e  isuty  military  personnel  permanently 
assigned  to  an  ar9?!nization  exercisiing  direct  command  over  an  agency  of 
the  associated  4rm»l  services  holdnnc  primary  responsibility  for  assessment 
of  military  persomnel. 

*As  appi muii  at  the  1978  Genera  Meeting  of  the  Association  2  Nov  78, 
Oklahoma  City,  liyU^dhoma 


B.    Associf'at^  Membership. 


1.   HEsrabershtp  in  this  cafegory  wi;    be  extended  to  permanent 
perronnel  of  vaerioujs  governmental^  slucatioraal ,  business,  industria.!  and 
prij^te  organizations  engaged  hr  acrivities  tihat  parallel  those  of  the 
priiaary  membership*.  Asstaciate  waoers  shall  be  entitled  to  all  privileges 
of  iprimary  msBlber^  ^ith  tie  exc€^j£®n  of  mertiership  on  the  Steering 
Coimf'ttee.   TIfhis  restriction  may      waived  by  the  majority  vote  of  the 
St5^^  \ng  CoFPrrlttee  . 


Article  IV  -  Dues 
No  annual  duel  5haTl  be  leviec  against  t  rrs  par*:ici pants. 


trt?       V  -  Stifennng  Committee 

^te  goven^img  body  ofr^  Association  ^na^U  be  the  Steering 
Cofwiii  UHb.    Tfie  §teerPng  Comnitts  shall  consist*:  rF  voting  and  ncn-'VOting 
meabers.    Uptif  -^  members  are  priinarj^  rmembers  of  -the  Steering  Gommittee. 
Primary  menteersff n' p  shall  faclude: 

L    The  .ofliwndTing  Officers  arf  the  re^Nyae-tive  agencies  of  ti»e 
armed  s^rvxces  ^i^^rclsirt^g  respsnsfbiTity  for  o^i^wmel  assessment  pnagrams. 

L.   ^Ve  ranlcing  crivilian  orc^'essiona i  :an5ii!iayees  of  the  respa2±ive 
agencies  jf^  :me  armerf  service  exenrpfimg  primar;  -responsibility  for  mne 
conduct  of  lETsortnel  asses^smeirt  syst^s.    Each  asmncy  shall  have  no  naore 
than  ttw  {2)  :professfAftai  civilian  nev^resentatiiisE- 

B.  Associate  onefmbersrip  of  tb«  Steering  CoMii  tUse  shall  be  extaeinred 

by  majorT:ty  vote  of  t*ie  corami ttee -^,0  ^epresentatrrves  'ssif  various  govemaiental , 
educatiDnal,  ^J^as^f^^,  industrial  ifinc  private  organizations  whose  purpiBses 
parallel  thcBBe  of  tht  Association. 

C.  The  Chairrmam  of  the  Steeritig  Committee  shall  3e  appointed  by  the 
President  o""  the  Ass^r^^^atlorr.    Tte'iarm  of  office  shall  be  one  year  sm 
shall  tegin  i;he  larft  da^y  of  the  airoiicl  conference. 

D.  The  5tes?riRr/  CiSflP'ttee  sw'  T  have  general  supervision  over  tiie 
affairs  of  ism  As -.ocl^tion*.  and  shai^  have  the  response i!yility  for  all 
activities  CTrti/t&  Asscitiation.    The  Steering  Conmitte*  shall  conduct  the 
business  of  the  fl^sociatiun  in  the  interim  between  amoiital  conferences  of 
the  Association  b>  si^-ch  m^ns  o-^  communication  as  deeiwed  appropriate  i)y 
the  President  or  C^'miini. 

E.  Meeting)  (^f  the  Steering  Committee  shall  be  held  during  the  annual 
conferences  of  tihe  tewci^-tnon  and  at  such  times  as  requested  by  the 
President  of  the  /(teaEzrafciewi  or  the  Chairman  of  the  Steering  Coimnittee. 
Representation  fVoBT  Ibie  majority  of  the  organizations  of  the  Steering 
Committee  shall  cons±fe:te  a  quorum. 
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Article  VI  -  Officers 


A.  The  officers  of  the  Association  shall  consist  of  a  President, 
Chairman  of  the  Steering  Committee  and  a  Secretary. 

B.  The  President  of  the  Association  shall  be  the  Conmanding  Officer 
of  the* armed  services  agency  coordinating  the  annual  conference  of  the 
Association.    The  term  of  the  President  shall  begin  at  the  close  of  the 
annual  conference  of  the  Association  and  shall  expire  at  the  close  of  the 
next  annual  conference. 

C.  It  shall  be  the  duty  of  the  President  to  organize  and  coordinate 
the  annual  conference  of  the  Association  held  during  his  term  of  office, 
and  to  perform  the  customary  duties  of  a  president. 

D.  The  Secretary  of  the  Association  shall  be  filled  through  appoint- 
ment by  the  President  of  the  Association.    The  term  of  office  of  the 
Secretary  shall  be  the  same  as  that  of  the  President. 

E     It  shall  be  the  duty  of  the  Secretary  of  the  Association  to  keep 
the  records  of  the  association,  and  the  Steering  Committee,  and  to 
conducx  official  correspondence  of  the  association,  and  to  insure  notices 
for  conferences.    The  Secretary  shall  solicit  nominations  for  the  Hanny 
Greer  award  prior  to  the  annual  conference.    The  Secretary  shall  also 
perform  such  additional  duties  and  take  such  additional  responsibilities 
as  the  President  may  delegate  to  him. 


Article  VII  -  Meetings 
A.    The  /!-sociation  shall  hold  a  conference  annually. 

B     The  annual  conference  of  the  Association  shall  be  coordinated  by 
the  agencies  of  the  associated  armed  services  exercising  primary  responsi- 
bility for  military  personnel  assessment.    The  coordinating  agencies  and 
the  order  of  rotation  will  be  determined  annually  by  the  Steering  Committee. 
The  coordinating  agencies  for  at  least  the  following  three  years  will  be 
announced  at  thv3  annual  meeting. 

C     The  annual  conference  of  the  Association  shall  be  held  at  a  time 
and  place  determined  by  the  coordinating  agency.    The  membership  of  the 
association  shall  be  informed  at  the  annual  conference  of  the  place  at 
which  the  following  annual  conference  will  be  held.    The  coordinating 
agency  shall  inform  the  Steering  Committee  of  the  time  of  the  annual 
conference  not  less  than  six  (6)  months  prior  to  the  conference. 

D     The  coordinating  agency  shall  exercise  planning  and  supervision 
over  the  program  of  the  annual  conference.    Final  selection  of  program 
content  shall  be  the  responsibility  of  the  coordinating  organization. 
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E.    Any  other  organization  desiring  to  coordinate  the  conference 
submit  a  formal  request  to  the  Chairman  of  the  Steering  Committee,  r 
later  than  18  months  prior  to  the  date  nthey  wish  to  serve  as  host. 


Article  VIII  -  Committees 

A.  Standing  committees  m^  be  named  from  time  to  time,  as  rs^ip^recj^. 
by  vote  of  the  Steering  Commitl^.    The  chairman  of  each  standing criDwnittee 
shall  be  appoSinted  by  the  Chainssn  of  the  Steering  Committee.    Me#»^^rs  of 
standing  comratttees  shall  be  appointed  by  the  Chairman  of  the  Ste»i^irflq 
Committee  in  consultation  with  tire  Chairman  of  the  commitise  in  qwrest'  ** an. 
Chairmen  and  committee  members  sirall  serve  in  trheir  appoirrDed  cap33cit  ias: 

at  the  discretion  of  the  Chairman  of  the  Steering  Committee.   The  rmdu 
of  the  Steering  Committee  shall  be  ex  officio  member  of  aV  stafidir 
coimiittees. 

B.  The  President  with  the  counsel  and  approval  of  the  St«r  . 
Committee  may  appoint  such  ad  hoc  committees  as  are  needed  froir  t^'*^^  : 
time.    An  ad  hoc  committee  shall  serve  until  its  assigned  task  ts 
completed  or  for  the  length  of  time  specified  by  the  President  in  co^rn^ul- 
tation  with  the  Steering  Committee. 

C.  All  standing  committees  shall  clear  their  general  plans  ot  dictcfon 
and  new  policies  through  the  Steering  Committee,  and  no  committee  or 
committee  chairman  shall  enter  into  relationships  or  activities  wit^ 
persons  or  groups  outside  of  the  Association  that  extend  beyond  v.^e 
approved  general  plan  of  work  without  the  specific  authorizatior 
Steering  Committee. 

D.  In  the  interest  of  continuity,  if  any  officer  or  member  / 
duty  electea  or  appointed  placed  on  him,  and  is  unable  to  perform 
designated  duty,  he  should  decline  and  notify  at  once  the  office  ne 
association  that  he  cannot  accept  or  continue  said  duty. 


Article  IX  -  Amendments 

A.  Amendments  of  these  By-Laws  may  be  made  at  any  annual  c  ^ence 
of  the  Association. 

B.  Amendments  of  the  By-Laws  may  be  made  by  majority  vote  j 
assembled  inembership  of  the  Association  provided  that  the  propos^    ^  mend- 
men  ts  shall  have  been  approved  by  a  majority  vote  of  the  Steerrtc  r  uiimittee. 

C.  Prc^osed  amendments  not  approved  by  a  majoritj'  vote  of  tiz 
Steering  Committee  shall  require  a  two-third's -vote  of  the  asseaErie^a 
membership  of  the  association. 
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Article  X  -  Voting 
All  taemberrs      attendance  stell  be  voting  members. 


Article  XI  -  Biactment 

Thesie  a»-  Lac  shall  be  in  fores  iirnnediateli/'  torn  accept«»ce  by  a 
nHjartty  ►osf  w-a  assmblsd  membershio  of  the  Assncirr-on  and/xw  amended 
in-forGR  ZliPOwnber  1973). 


■  - 
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STEERING  COMMITTEE  MEMBERS 
of  the 

MILITARY  TESTING  ASIOCIATION 

1.  te*fa1  Personne:!  ^-search  and  Development  Center 

2.  Iia«al  Educaticrrr  ijrd  Training  Pre  ram  Development  Cente^ 

3.  'r\rmy  Research  ii^r-^i tute 

4.  ^  Force  Humar  Resorces  Laboratory 

5..    Atr  Force  OccunHti^r^frl  Measurement  Center 

Army  Individual  T-a^^^ing  Evaluation  Directorate 

7.  U.  S.  Coast  Guard  Institute 

8,  Canadian  Forces  P-^^rsonnel  Applied  Research  Unit 

9  lanadian  Forces  D    ectorate  for  Manpower  Occupational  Structures 

10.  Royal  Australiar        Force  Evaluation  Division 

IT.  German  Armed  Fo>    s  Association 

12.  German  Armed  Fo»^-^5  Psychological  Services  Research  Institute 
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HARRY  H.  GREER  AWARD 


The  Military  Testing  Association  is  an  oiErgrawth  of  an  informal 
meetimg  of  representatives  of  the  various  armeE  frorces  testing  agencies 
In  1958.    The  meeting  was  held  at  the  suggestiam  (and  through  tte 
personal  coordination)  of  CAPT  Harry  H.  GEER..  •..SN,  Coirmanding 
Officer  of  the  Naval  Examining  Center.    Thus,         GREER  was  the 
"founder"  of  the  Military  Testing  Association,    In  1962,  an  award  in 
his  nam  was  created  to  recognize  significant  lasting  contributions 
to  the  Association  while  exemplifying  the  ideal 5=  of  the  Association 
and  its  founder. 

The  five  recipients  of  the  award  since  1962  are: 

1952  CAPT  Harry  H.  GREER,  JSN 

1970  COL  J.  M.  McLANATHAN,  USAF 

1974  MR.  C.  J.  MacALUSO,  Naval  Examining  Center 

1977  DR.  W.  J.  MOONAN,  Naval  Personnel  arch 

and  Development  Center 

1977  MR.  J.  A.  BURT,  U.  S.  Coast  Guard  Institute 


1     "  .- 
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INDEX  OF  AUTHORS 
AND 

LIST  OF  CONFEREES 


ADAMICK,  Daniel  R. 

USflJDCES,  Aberdeen  Proving  Ground,  Attn:  ATSL-TD-TD, 

Maryland  21005 
ADAMS,  WAJ  Jerome  Ph.D. 

308»-B  Stony  Lonesome,  West  Point,  NY  10996 
Paper  presented:    "Leader  Sex,  Leader  Descriptions  of  Own 

Behavior,  and  Subordinates  Decription  of  Leader 

Bdtavior  

ADAMS,  William 

Ch;*]ef  of  Naval  Education  &  Training,  Pensacola,  FL  32508 
ADKINS,  Homer 

CiTfef  of  Naval  Education  &  Training,  NAS  Pensacola, 

Florida  32508 
ALLMAN,  CAPT  Thomas  S. 

U5AF  Squadron  Officers  School/Chief,  Standardization 

Division,  SOS/EDVS,  Maxwell  AFB,  AL  36112 
ANDERSON,  Kermit  B. 

1408  Spruce,  Norman,  OK  73069 
ANSBRO,  Thomas  M. 

CDG  CNET  N-5  Bldg  679,  NAVAIRSTA,  Pensacola,  FL  32508 
Paper  presented:    "Using  the  Computer  to  Build  the  Task 

Inventory   

ANZELMO,  CAPT  Ralph  H.  (USMC) 

HQMC  Office  of  Manpower  Utilization  (MPU),  Quantico, 

Virginia  22134 
ASA-DORIAN,  Paul  V. 

Fleet  Anti-Submarine  Warfare  Training  Center,  San  Diego 

California  92147 
AUMENT,  John  (USAF) 

443D  TCHTS/QUV,  Altus  AFB,  OK  73521 
AVERSANO,  Dr.  Francis  M. 

ATTSC-IT-TD,  US  Army  Training  Support  Center,  Fort 

Eustis,  VA  23604 
Paper  presented:    "Task  Analysis:  Destination  or  Journey"  .  . 
BABIN,  Ms.  Nehama 

Army  Research  Institute,  5001  Eisenhower  Ave.,  Alexandria, 

Virginia  22333 

Paper  presented:    "Differential  Field  Assignment  Patterns 
for  Male  and  Female  Soldiers"  ;  >   
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BARANj  Harry  A. 

34  Old  Yeilowsprings  Rd.,  Fairborn,  OH  45324 
Paper  presented:    "PAM:    A  Methodology  for  Predicting  Air 

Force  Personnel  Availability"   

BARBER,  Herbert  F. 

US  Anny  Research  Institute  Field  Unit,  P.O.  Box  31ZZ, 

Ft.  Leavenworth,  KS  66027 
Paper  presented:    "Critical  Performances  of  Battalion 

Command  Groups"   

BARRON,  Clovis  J.  ,  ^ 

USN  Naval  Education  &  Training  Program,  Development 

Center  -  Code  PDIO,  Pensacola,  FL  32509 
BEEL,  CD.  „  ^ 

Naval  Manpower  Utilization  Unit,  HMS  Vernon,  Portsmouth 

POl  3ER,  Hampshire,  England  ,  ,    ,  . 

Paper  presented:    "Execution  of  Large  Occupational  Analysis 

of  the  Royal  Navy's  Operations  Branch"  

BEGLAND,  CAPT  Robert  R. 

Training  Development  Institute,  USA  TRADOC,  123  Tabb  Lane, 

Tabb,  VA  23602  „     «  ' 

Paper  presented:    "How  Do  You  Buy  'Good  Design  :  An 

Examination  of  the  Army's  TEC  Program"  

BELL,  ILT  Steven  J. ,  MSC  .  \   ^  ^ 

Training  Evaluation  Division,  DTDE,  AHS,  Superintendent, 

Academy  of  Health  Sciences,  USA,  ATTN:    HSA-TEC,  Fort 

Sam  Houston,  TX  78234 
BENNEH,  CPT  Oscar  D.  ^  ^   ^  ^ 

Academy  of  Health  Sciences,  ATTN:    HSA-TIP,  Fort  Sam 

Houston,  TX  78234 
BERGMANN,  Joseph  A. 

AFHRL/ORA,  Brooks  AFB,  TX  78235 
Paper  presented:    "Female  Utilization  in  Non-Traditional 

Ar69s"  ».••• 

BERNSTEIN,  LCDr' David  M.  .  .  c       .  c. 

HQ,  USCG  Reserve  Training  Division,  400  7th  Street,  SW, 

Washington,  DC  20590 

BILLS,  CAPT  Conrad  G.  ,..r-«»^/nunr 
USAF  Occupational  Measurement  Center,  USAFOMC/OMDC, 

Lackland  AFB,  TX  78236  . 
Paper  presented:    "Evaluation  of  Computer- Derived  Test  Out- 
lines Using  Convention.il  Test  Outlines  as  a  Criterion 
Reference  During  Test  Development  Projects"   

BIRDSALL,  Walter  W.  ,        .  r  . 

Naval  Education  &  Training  Program  Development  Center, 
Ellyson,  Pensacola,  FL  32509 
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BLANKENSHIP,  Constance 

Navy  Personnel  Research  and  Development  Center, 

San  Diego,  CA  92152 
Paper  presented:    "The  Premature  Attrition  of  Navy  Female 

Enlistees"   420 

BODRON,  LCDR  Donald  E. 

USCG  Institute,  P.O.  Substation  18,  Oklahoma  City,  OK 

73169 
BOLDT,  R.  F. 

Educational  Testing  Service,  Rosedsle  Rd. ,  Princeton, 

New  Jersey  08541 
Paper  presented:    "Some  Implications  of  Commercial  Test 

Normings  for  Mobilization  Surveys"   633 

BONETTE,  Cedella  J. 

USA  Military  Personnel  Center,  DAPC-MSP-D,  200  Stovall 

St.,  Alexandria,  VA  22332 
Paper  presented:    "General  Overview  and  Initial  Findings  of 

the  Project  on  Job  Satisfaction  and  Retention  of  US  Army 

Enlisted  Personnel"    75 

BOONE,  Dr.  James  0. 

FAA  Civil  Aeromedical  Institute,  Mike  Monroney  Aero- 
nautical Center,  P.O.  Box  25082,  Oklahoma  City, 

Oklahoma  73125 

Paper  presented:    "A  New  Procedure- to  Make  Maximum  Use  of 

Available  Information  When  Correcting  Correlations  for 

Restriction  in  Range  Due  to  Selection"   906 

BOSSHARDT,  Michael  J. 

Personnel  Decisions  Research  Institute,  2415  Foshay  Tower, 

Minneapolis,  MN  55402 
Paper  presented:    "Content  Validation  of  Class  A  School 

Curricula  in  the  Coast  Guard"   1107 

BOTHWELL,  Cheryl 

USCG  Institute,  P.O.  Substation  18,  Oklahoma  City,  OK 

73169 

BOWER,  CAPT  Frederick  B.  Jr. 

USAF  Occupational  Measurement  Center,  Lackland  AFB 
Texas  78236 

Paper  pre^  ^nted:    "The  Stability  Over  Time  of  Air  Force 
Enlisted  Career  Ladders  as  Observed  in  Occupational 

Survey  Reports"    228 

BOWNAS,  David  A. 

Personnel  Decisions  Research  Institute,  2415  Foshay  Tower, 
Minneapolis,  MN  55402 

Paper  presented:    "Content  Validation  of  Class  A  School 
Curricula  in  the  Coast  Guard"   1107 
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BOWSER,  Samuel  E.  ,      .    .  „       r,^  Q9nAi 

5900  Lake  Murray  Boulevard,  LaMesa,  CA  SdOU 

BRADNER,  Dr.  Cleveland  Jr.  «     •,  * 

PD5,  Naval  Education  &  Training  Program  Development 
Center,  Ellyson,  Pensacola,  FL  32509 

BREWER,  ENS  David  B.  ..    .  ^ 

Reserve  Training  Division,  USCG  Headquarters, 

Washington,  DC  20590 

^"^"^'USCG  Institute,  P.O.  Substation  18,  Oklahoma  City,  OK 
73169 

BURNS,  Darla  J.  u    ,  ni^ 

iiSCG  Institute,  P.O.  Substation  18,  Oklahoma  City,  OK 

73169' 

^""^^^US^Militrry  Personnel  Center,  ATTN:  OAPC-MSP-SM, 
200  Stovall  St.,  Alexandria,  VA  22332 
Paper  presented:    "Evaluating  the  Army  Occupational  Survey 
Program  Methodology:    Answer  Booklets,  Questionnaire 
Length,  and  Population  Coverage"  

^"'^^'uSCG^lSstitute,  P.O.  Substation  18,  Oklahoma  City,  OK 
73169 

BURTCH,  Lloyd  D.  .        „     ,  acb 

Air  Force  Human  Resources  Laboratory,  Brooks  AFb, 

PaJlJ^prlsented:    "A  Methodology  to  Evaluate  the  Aptitude 
Requirements  of  Air  Force  Jobs"   

BURTON,  LTJG  Richard  T.  .     ,„   m-i  u    ,  r,-+w  ni^ 

USCG  Institute,  P.O.  Substation  18,  Oklahoma  City,  OK 

73169 
BYHAM.  W. 

'      Development  Dimensions,  Inc.,  Pittsburg,  PA 

Paper  presented:    "Development  of  the  Army  ROTC  Management 
Simulation  Program  and  Instructors'  Orientation  Program'  .  1091 

^•'^'^^^USCG^JSsiituie,  P.O.  substation  18,  Oklahoma  City,  OK 
73169 

""'TEN?RoK"nSlt?onal.  Inc..  14023  Rocky  P1ne  Woods, 
San  Antonio,  TX  78249 

^'^'^'^Naialiducation  &  Training  Program  Development  Center, 
Box  212A  Rt.  4,  Pensacola,  FL  32504 

"'''5sCG'?nsmSu%"6.  substation  18.  Oklahoma  City.  OK 
73169 
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CASTELNOVO,  Anthony  E. 

US  Arw  Research  Institute,  P.O.  Box  3066,  Ft.  Still, 
Oklahoma  73503 

Paper  presented:    "Development  of  the  Army  ROTC  Management 
Simulation  Program  and  Instructors'  Orientation  Course"  . 
Paper  presented:    "Prediction  of  Field  Artillery  Officer 

Performance"  

CHAGALIS,  CPT  George  P. 

Acadeiry  Health  Sciences,  Room  247,  Ft.  Sam  Houston, 
Texas  78233 
CHASE,  LT  Philip  K. 

USCG  Institute,  P.O.  Substation  18,  Oklahoma  City,  OK 

73169 

CHRISTAL,  Raymond  E. 

AFHRL/ORA,  Brooks  AFB,  TX  78235 
Paper  presented:    "Female  Utilization  in  Non-Traditional 

Areas   

CONN,  Barbara  A. 

USCG  Institute,  P.O.  Substation  18,  Oklahoma  City,  OK 

73169 

COOK,  ENS  Deborah  J. 

USCG  Institute,  P.O.  Substation  18,  Oklahoma  City,  OK 

73169 
CORY,  Charles  H. 

Navy  Personnel  Research  &  Development  Center,  San  Diego, 

California  92152 
Paper  presented:    "Assessment  Center  Variables  as  Predictors 

of  On-Job  Performance  Characteristics"  

COWAN,  Douglas  K. 

AFHRL,  Brooks  AFB,  TX  78235 
Paper  presented:    "Civilian  Ground  Safety  Officer  Job  and 

Training  Requirements  Survey"   

CRIMMINS,  CW02  James  H. 

USCG  Institute,  P.O.  Substation  18,  Oklahoma  City,  OK 

73169 

CRONIN,  ENS  Michael  J. 

USCG  Institute,  P.O.  Substation  18,  Oklahoma  City,  OK 

73169 

CUMMINGS,  CAPT  William  H. 

ATC  Technology  Applications  Center,  Lackland  AFB,  TX 
78236 

Paper  presented:    "Job  Performance  of  USAF  Bypassed 
Specialists"  
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CUNNINGHAM,  J.  W. 

North  Carolina  State  University,  2205  Hillsborough, 

Raleigh,  NC  27607 
Paper  presented:    "Determining  the  Training  Requirements  of 

United  States  Coast  Guard  Warrant  and  Commissioned 

Officer  Billets"  

CZUCHRY,  Andrew  J. 

Dynamics  Research  Corporation,  Wilmington,  MA 
Paper  presented:    "PAM:    A  Methodology  for  Predicting  Air 

Force  Personnel  Availability"   

DAPRA,  R.  A. 

Development  Dimensions,  Inc.,  Pittsburgh,  PA 
Paper  presented:    "Development  of  the  Army  ROTC  Management 

Simulation  Program  and  Instructors'  Orientation  Course"  . 
DAVILA,  LTJG  Robert  E. 

USCG  Institute,  P.O.  Substation  18,  Oklahoma  City,  OK 

73169 
DAVIS,  D.  Douglass 

Chief  of  Naval  Education  &  Training  (CNET),  Naval  Air 

Station,  Pensacola,  FL  32508 
Paper  presented:    "Data  Base  To  Determination  of  Training 

Content:    A  Manageable  Solution"  

DELONEY  Rebecca 

USCG  Institute,  P.O.  Substation  18,  Oklahoma  City,  OK 

73169 
DeVRIES,  Philip  B. 

McDonnell  Douglas  Astronautics  Co.,  P.O.  Box  516, 

St.  Louis,  MO  63166 
Paper  presented:    "Methods  for  Collecting  and  Analyzing 

Task  Analysis  Data"   

DeVRIES,  LCDR  Richard  L. 

RESGRU  SW  Hbr  01-88804,  Box  147,  RFD  1,  Rockland,  ME 

04841 

DICKINSON,  Richard  W. 

Computer  Programming  &  Statistical  Analysis,  Occupational 

Research  Program  -  Industrial  Engineering,  Texas  A&M 

University,  College  Station,  TX  77843 
DIETERLY,  Duncan  L. 

USAF  Human  Resources  Laboratory,  Wright-Patterson  AFB, 

Ohio  45433 

Paper  presented:    "PAM:    A  Methodology  for  Predicting  Air 

Force  Personnel  Availability"   .  .  .  . 

DITULLIA§l>aul 

USAfUl^cupational  Measurement  Center,  Lackland  AFB,  TX 

7823^^ 
DOORLEY,  R"f chard  D. 

USA  Military  Personnel  Center,  200  Stovall  St.,  Riti  1S23, 

Alexandria,  VA  22332 
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DOW,  Dr.  Andrew  N. 

USNETPDC  -  Ellyson,  Pensacola,  FL  32509 
Paper  presented:    "Objective  Evaluation  of  Correspondence 

Course  Items"   

DREW,  LT  Richard 

Officer  in  Charge,  Central  Test  Site  for  PTEP,  NAVGMS 

Virginia  Beach,  VA  23461 
DREWES,  D.  W. 

North  Caroline  State  University,  2205  Hillsborough, 
Raleigh,  NC  27607 
Paper  presented:    "Determining  the  Training  Requirements  of 
United  States  Coast  Guard  Warrant  and  Commissioned  Officer 

Billets"  

DRISKILL,  Dr.  Walter  E. 

USAF  Occupational  Measurement  Center/OMYO,  Lackland  ARB, 
Texas  78236 

Paper  presented:    "Four  Fundamental  Criteria  for  Describing 

the  Tasks  of  an  Occupational  Specialty"   

Paper  presented:    "The  Stability  Over  Time  of  Air  Force 

Enlisted  Career  Ladders  as  Observed  in  Occupational 

Survey  Reports"   

DUFFY,  Paul  C 

Marine  Corps  Institute,  Marine  Barracks  8th  &  I  Sts. , 

P.O.  Box  1775,  Washington,  DC  20013 
DURHAM,  MA J  Charles  V. 

Evaluation  Branch,  Academic  Instructor  School,  USAF, 

AIRFOS,  EDV,  Maxwell  AFB,  AL  36112 
DYER,  Dr.  Frederick  N. 

Army  Research  Institute  Field  Unit,  P.O.  Box  2086, 

Ft.  Benning,  GA  31905 
Paper  presented:    "Using  an  Assessment  Center  to  Predict 

Leadership  Course  rerformance  of  Army  Officers  and  NCOs  . 
EARLES,  James  A. 

AFHRL/PES,  Brooks  AFB,  TX  78235 
Paper  presented:    "The  Content  Issue  in  Performance  Appraisal 

Ratings"  

EASTMAN,  Robert  F. 

US  kmy  Research  Institute  Field  Unit,  P.O.  Box  476, 

Ft.  Rucker,  AL  36362 
Paper  presented:    "Validity  of  Associate  Ratings  of 

Performance  Potential  by  Army  Aviators"   

ELLIS,  Dr.  John  A. 

Navy  Personnel  Research  &  Development  Center,  Code  304, 

San  Diego,  CA  92152 
Paper  presented:    "The  Instructional  Quality  Inventory: 

Introduction  and  Overview"  
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ELLIS,  CAPT  R.  T. 

Canadian  Forces  Personnel  Applied  Research  Institute, 

4900  Yonge  St.,  Irfillowdale,  Ontario,  Canada 
ESCHENBRENNER,  Dr.  A.  John 

McDonnell  Douglass  Astronautics  Co.,  P.O.  Box  516, 

St.  Louis,  MO  63166 
Paper  presented:    "Methods  for  Collecting  and  Analyzing 

Task  Analysis  Data"   

ESLICK,  CW04  David  W. 

USC6  Institute,  P.O.  Substation  18,  Oklahoma  City,  OK 

73169 
EVANS,  Ermon  M. 

Chief  of  Naval  Technical  Training,  (CNTECHTRA), 

704  M.  Sherrod,  Covington,  TN  38019 
FARNSWORTH,  LT  Barry  A. 

USCG  Institute,  P.O.  Substation  18,  Oklahoma  City,  OK 

73169 
FARRIS,  John  C. 

Data-Design  Laboratories,  L5  Koger,  P.O.  Box  12773, 

Norfolk,  VA  23502 
FERGUSON,  CAPT.  J.  E. 

USCG  Institute,  P.O.  Substation  18,  Oklahoma  City,  OK 

73169 
FINE,  Dr.  Sidney  A. 

Advanced  Research  Resources  Org.,  4330  East  West  Highway, 

Bethesda,  MD  20014 
Paper  presented:    "Analysis  of  Heavy  Equipment  Operator 

Jobs"   

FISCHL,  Dr.  M.  A. 

317  Rexburg  Ave.,  Fort  Washington,  MD  20022 
Paper  presented:    "Measuring  the  Military  Base  Population 

of  the  1980' s"  

FOGLE,  Charles  C. 

FAA  Airman  Examinations  AFS-593,  AERO  Center,  Will 

Rogers  Field,  OK 
FOLEY,  Paul  P. 

3965  Aqua  Dulce  Blvd.,  Spring  Valley,  VA  92077 
FORMAN,  Ima  R. 

USCG  Institute,  P.O.  Substation  18,  Oklahoma  City,  Ok 

73169 

FREY,  Dr.  Robert  L.  Jr. 

USCG  (G-P- 1/2/62),  Washington,  DC  20590 
GAFNEY,  LT  Edward  J.  Ill 

Strategis  Systems  Project  Office,  Crystal  City  Mall  3, 

Washington,  DC  20376 
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GENTNER,  CAPT  Frank  C. 

USAF  Occupational  Mteasurement  Center/OMYO,  Lackland  AFB, 
Texas  78236 

Paper  presented:    "Four  Fundamental  Criteria  for  Describing 

the  Tasks  of  an  Occupational  Specialty"    204 

GEORGE,  Dr.  Clay  E. 

Dept.  of  Psychology,  Texas  Tech  University,  Lubbock,  TX 
79409 

Paper  presented:    "An  Analysis  of  the  OE  Concept  and 

Suggested  Improvements"   

GERBER,  Dr.  J.  E.  Jr. 

HQ  US  Army  Forces  Command,  KTTN  AFPR-PSE,  Ft.  McPherson, 
Georgia  30330 
GILBERT,  Dr.  Arthur  C.  F. 

US  Army  Research  Institute,  5001  Eisenhower  Ave. , 
Alexandria,  VA  22333 
Paper  presented:    "Prediction  of  Field  Artillery  Officer 

Performance"   839 

Paper  presented:    "Predictive  Utility  of  the  Officer  Evalua- 
tion Battery  (OEB)"    753 

Paper  presented:    "Quality  of  ROTC  Accessions  to  the  Army 

>  Officer  Corps"  

'  GIORGIA,  M.  Joyce 

Air  Force  Human  Resources  Lab,  Brooks  AFB,  TX  78235 
GOCLOWSKI,  John  C. 

Dynamics  Research  Corporation,  Wilmington,  MA 
Paper  presented:    "PAM:    A  Methodology  for  Predicting  Air 

Force  Personnel  Availability"    602 

GOLDMAN,  Dr.  Lawrepr.e  A. 

USA  Military  Personnel  Center,  DAPC-MSP-D,  200  Stovall  St., 
Alexandria,  VA  22332 
Paper  presented:    "General  Overview  and  Initial  Findings  of 
the  Project  on  Job  Satisfaction  and  Retention  of  U.S.  Arniy 

Enlisted  Personnel"    75 

GOODGAME,  Doug 

Occupational  Research  Program,  Industrial  Engineer  Dept., 
Texas  A&M  University,  College  Station,  TX  77801 
Paper  presented:    "Scheduliig  Formal  School  Training  to 

Maximize  Cost  Effectiveness"   286 

GOODY,  Kenneth 

AFHRL,  Brooks  AFB,  TX  78Z35 
Paper  presentedl::   "Benchmark  Scales  for  Collecting  Task 

Training  Factor  Data"    556 

GORDON,  Mr.  M.  ieriwether 

AF  ROTC,  A¥ROTC/ACME,  Maxwell  AFB,  AL  36112 
Paper  presented:    "Weighted  Selection  System  for  AFROTC 
Applicants— Perspective  After  Second  Year  of  Use"  .... 
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GOULS,  Dr.  R.  Bruce 

AFHRL/PES,  Brooks  AFB,  TX  78235 
GRAHAM,  Dr.  William  W.  Jr. 

MEPCON,  MEPCT-P,  Bldg.  83,  Ft.  Sheridan  IL  60037 
Paper  presented:    "Development  of  a  Mobilization  Population 

Inventory  Using  Existing  ASVAB  Data  Banks"  

GRIMM,  Richard 

PD-10,  NETPDC  Ellyson,  Bldg.  922,  Pensacola,  FL  32509 
GROETKEN,  LTC  David  L. 

Chief,  Analysis  Div.  Directorate  of  EVAC,  USA  Field 

Artillery  School,  Ft.  Sill,  OK  73503 
GROVER,  Martha  S. 

Defense  Intelligence  School,  Washington,  DC  20374 

USA  Infantry  School,  SFTD,  Directorate  of  Training, 

Ft.  Benning,  GA  31905 
HALADYNA,  Tom 

Oregon  College  of  Education,  Monmouth,  OR  97361 
Paper  presented:    "The  Emergence  of  an  Item-Writing 

Technology"   

HALTRECHT,  Dr.  Ed 

Personnel  Research,  Ontario  Hydro  (H2-D17),  700  University 

Ave.,  Toronto,  Ontario  M5G  1X6 
HANLON,  John  P.. 

Ft.  Devens,  MA  01433 
HASSALL,  LCDR  James  L. 

USCG  Institute,  P.O.  Substation  18,  Oklahoma  City,  OK 

73169 
HASSEN,  John  E. 

Code  N5B2,  Chief  of  Naval  Education  &  Training  Support, 

Bldg.  997,  Ellyson  Field,  Pensacola,  FL  32509 
HAWRYSH,  CDR  Fred  J. 

Dirsctorats  of  Military  Occupational  Structures, 

Canadian  Forces,  National  Defense  Headquarters,  Ottawa, 

Ontario,  Canada  KIA  0K2 
HEJL,  CW04  L.  E. 

USCG  Institute,  P.O.  Substation  18,  Oklahoma  City,  OK 

73169 

HENDERSON,  Robert  G. 

Defense  Language  Institute  Foreign  Language  Center, 

ATTN:    ATLF-TD-JS,  Presidio  of  Monterey,  CA  93940 
Paper  presented:    "The  Defense  Language  Aptitude  Battery 

(DLAB)"   

HENN,  LT  COL  Manfred 

MOD  Germany,  Ministry  of  Defense  -  Armed  Forces  Staff  13, 

Postfach  1328,  5300  Bonn  1,  W.  Germany 
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HICKS,  Dr.  Jack  M. 

6827  Old  Chesterbrook  Rd. ,  McLean,  VA  22101 
Paper  presented:    "Leader  Sex,  Leader  Descriptions  of  Own 

Behavior,  and  Subordinates  Description  of  Leader  Behavior" 
Chairman,  Symposium:    "Methodology  for  Mobilization  Popu- 
lation Inventory"  ,  

HILLIGOSS,  Richard  E. 

Army  Research  Institute  Field  Unit,  P.O.  Box  2086, 
Ft.  Benning,  GA  31905 
Paper  presented:    "Using  an  Assessment  Center  to  Predict 
Leadership  Course  Performance  of  Army  Officers  and  NCOs". 
HOUTZ,  John  C. 

USA  Recruiting  (USARCASP-E) ,  Ft.  Sheridan,  IL  60037 
HOWARD,  Dr.  Charles  W. 

805  Cortijo,  El  Paso,  TX  79912 
Paper  presented:    "Methodology  for  Evaluating  Operator 
Performance  on  Tactical  Operational  Simulator/Trainers"  . 
HUNTER,  John  E. 

Michigan  State  University,  East  Lansing,  MI  48823 
Paper  presented:   'The  Impact  of  Valid  Selection  Procedures 

on  Workforce  Productivity"  

Paper  presented:    "Test  of  a  New  Model  of  Validity  General- 
ization:   Results  for  Tests  Used  in  Clerical  Selection"  . 
JACKSON,  Alvaline  B. 

3060D  Mower  Court,  Ft.  Mead,  MD  20755 
Paper  presented:    "Evaluation  of  Intelligence  Producing 

Capability  of  Selected  Combat  Arms  Units"    .  . 

JACKSON,  LT  COL  David  K. 

AFROTC/ACME,  Maxwell  AFB,  AL  36112 
Paper  presented:    "Weighted  Selection  System  for  AFROTC 
Applicants— Perspective  After  Second  Year  of  Use"  .... 
JACKSON,  William  L. 

Directorate  of  Training  Developments,  Training  Analysis 
&  Design  Division,  Ft.  Rucker,  AL  36362 
JENKINS,  William  J. 

US  Army,  Redstone  Arsenal,  AL  35481 
JENNINGS,  Alan  E. 

FAA  CAMI,  AAC118,  P.O.  Box  25082,  Oklahoma  City,  OK 
73125 

Paper  presented:    "A  Method  to  Evaluate  Performance  Relia- 
bility of  Individual  Subjects"  

JENNINGS,  Margarette  C. 

Advanced  Research  Resources  Organization,  4330  East  West 
Highway,  Bethesda,  MD  20014 
Paper  presented:    "Analysis  of  Heavy  Equipment  Operator 
Jobs"   
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tMUNSON    LT  David  G. 

USCG  Institute,  P.O.  Substation  18,  Oklahoma  City,  OK 

73169 
aSOWSON,  Dorothy 

Corry  Station,  Crypto! ogic  Dept.,  Pensacola,  FL  325 
JOHNSON,  Kirk  A. 

Naval  Personnel  Research  &  Development  Center,  639  "ly 

St.,  San  Diego,  CA  92120 
JOHNSON,  Robert  N. 

USA  Administration  Center  (DTD),  Ft.  Benjamin  Harr 

Indiana  46216 

Paper  presented:    "Design  of  Machine  Scorable  'Hands  On' 

Performance  Tests  in  a  Paper  and  Pencil  Mode"   

JONES,  Dr.  Jean 

Army  Research  Institute  for  the  Behavioral  &  Social 

Sciences,  HQ  TCATA  (PERI-OH),  Ft.  Hood,  TX  76544 
JONES,  Karen  N. 

USCG  Institute,  P.O.  Substation  18,  Oklahoma  City,  OK 

73169 
JONES,  Dr.  Todd 

R&D  US  Coast  Guard  (G-DSA-1/TP44) ,  Washingt:>n  DC  20590 
•^AHN,  Dr.  Arthur 

Mestinghouse  Defense  &  Electronic  Systems  Center,  P.O.  Box 

746  (M.S.  440),  Baltimore,  MD  21203 
Paper  presented:    "Experimental  Evaluation  of  a  High  Tech- 
nology Training  Program"  

KAPLAN,  Dr.  Ira  T. 

Training  Development,  US  Army  Research  Institute  Field 

Unit,  P.O.  Box  3122,  Ft.  Leavenworth,  KS  66027 
Paper  presented:    "Critical  Performances  of  Battalion 

Command  Groups"   

KEATES    CAPT  W.  E. 

Staff  Officer  Analysis,  Air  Command  Headquarters,  Westwin, 

Manitoba,  Canada  R2R  OTO 
Paper  presented:    Aircrew  Training  Research  -  Project 

ACTIVE"   

KEETH,  James  B. 

LISAF  Occupation  Measurement  Center,  Lackland  AFB,  TX 

78236 
KINNISON,  Henry  L. 

Dept.  of  Psychology,  Texas  Tech  University,  Lubbock, 
Texas  79409 

Paper  presented:    "An  Analysis  of  the  OE  Concept  and 

Suggested  Improvements"   

KINTOP,  Constance 

MGR  Personnel  Services,  Minneapolis  Personnel  Dept., 
312-3RD  Ave.  South,  Minneapolis,  MN  55415 
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KNAUP,  Peggy  A. 

USCG  Institute,  P.O.  Substation  18,  Oklahoma  C^'ty,  OK 

73169 
KNERR,  Dr.  C.  Mazie 

US  Anny  Research  Institute,  5001  Eisenhower  Ave., 

ATTN:    PERI-OU,  Alexandria,  VA  22323 
Paper  presented:    "An  Application  of  lactical  Engagement 

Simulation  for  Unit  Proficiency  Measurement"   1316 

KNIGHT  Patricia 

USCG  Institute,  P.O.  Substation  18,  Oklahoma  City,  OK 

73169 
KOHL,  Delbert  E. 

Marine  Corps  Institute,  Marine  Barracks,  Box  1775, 

Washington,  DC  20013 
KOSKI,  LT  John  D. 

USCG  Institute,  P.O.  Substation  18,  Oklahoma  City,  OK 

73169 

KRAIN,  Dr.  Burton  F. 

US  Civil  Service  Commission,  Intergovernmental  Personnel 

Programs  Division,  230  S.  Dearborn,  Chicago,  IL  60604 
KRIETEMEYER,  CDR  George  E. 

USCG  Aviation  Technical  Training  Center,  Elizabeth  City, 

North  Carolina  27909 
KUENZ  Dr.  Marjorie  A. 

Naval  Health  Sciences,  Education  &  Training  Command, 

National  Naval  Med.  Center,  Bethesda,  MD  20014 
Paper  presented:    "Systematic  Instructional  Validation 

Through  Testing"   275 

KUHNLE,  CDR  Robert  L. 

Leadership  Program  Staff,  USCG  Reserve  Training  Center, 

Yorktown,  VA  23690 
LAABS,  G.  J. 

Naux/  Personnel  Research  &  Develop-ment  Center,  San  Disgo, 
California  92152  " 
Paper  presented:    "Performance  Test  Objectivity:  Compari- 
son of  Interrater  Reliabilities  of  Three  Observation 
Formats"   831 

LAMBRECHT,  Marvin  W. 

1515  S.  Jefferson  Davis  Hwy,  Arlington,  VA  22202 

LANTERMAN,  Richard  S. 

US  Coast  Guard,  400  7th  St.,  SW,  Washington,  DC  20590 
Paper  presented:    "Content  Validation  of  Class  A  School 
Curricula  in  the  Coast  Guard"   1107 

LEECH,  LT  COL  Carl  A. 

Canadian  Forces  Directorate  of  Military  Occupational 
Structures,  National  Defense  Headquarters,  101  Colonel 
By  Drive,  Ottawa,  Ontario  Canada  KIA  0K2 
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LEFROY,  MAJ  Dal  r  .  ^ 

CF  PARU,  Suite  600,  North  York  Govt,  of  Canada  Bidg., 
Toronto,  Ontario,  Canada  M2N  6B7 

LEGER,  Marie  ^.  ^ 

US  Arny  Research  Institute,  5001  Eisenhower  Ave., 

Alexandria,  VA  22333 
Paper  presented:    "Validity  of  Associate  Ratings  of 

Performance  Potential  by  Army  Aviators"   

LEHMAN,  LT  Stanley  E.  _  n.,.  r-^ 

USCG  Institute,  P.O.  Substation  18,  Oklahoma  City,  OK 

73169 

LEWIS,  Dr.  John  R.  ^. 

USCG  Institute,  P.O.  Substation  18,  Oklahoma  City,  OK 

73169 
LEWIS,  Dr.  Mary  A. 

AAC-118  FAA/CAMI,  P.O.  Box  25082,  Oklahoma  City,  OK  73125 
Paper  presented:    "A  Comparison  of  Three  Models  for  Deter- 
mining Test  Fairness''  

LINCOLN,  John  0.  .    , .  .  ,  r  * 

Defense  Language  Institute,  English  Language  Center, 

Lackland  AFB,  TX  78236 

LINDSEY,  Shellie 

6403  E.  16th,  Anchorage,  AK  99504 

LINNON,  CDR  J.  L.  r  ,    .  v    l    mv  innn/i 

USCG  Training  Center,  Governors  Island,  New  York,  NY  10004 

LIU,  Georgina 

Army  Education  Center,  Ft.  Ord,  CA  93941 
LOFASO,  Anthony  J. 

Dynamics  Research  Corp.,  Wilmington,  MA 
Paper  presented:    "PAM:    A  Methodology  for  Predicting 

Air  Force  Personnel  Availability"   

LONG,  James  L.  _ 

CNET  (Code  N-b31),  NAS  Pensacoia,  PL  3250o 
LOTZ,  George  Jr. 

4713  NW  59th  Terrace,  Oklahoma  City,  OK  73122 

LOWE,  Muriel  ^.^ 

USCG  Institute,  P.O.  Substation  18,  Oklahoma  City,  OK 

73169 

MARCO,  Ruth  Ann  .     .      n  «   o  c-.c 

McDonnell  Douglas  Astronautics  Co.,  P.O.  Box  516, 

St.  Louis,  MO  63166                                             .  . 
Paper  presented:    "Methodology  for  Selection  and  Training 
of  Artillery  Forward  Observers  Job  Analysis"  

MARTIN,  J.  Thomas  Jr.  .        .     ^  . 

Data  Design  Laboratories,  15  Koger  Executive  Center, 

suite  140,  Norfolk,  VA  23502                     -     „  ^  m 
Paper  presented:    "A  Comparison  of  Two  Criterion-Referenced 
Scoring  Procedures  for  an  Answer-Until -Correct,  Multiple- 
Choice  Performance  Test"  
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MARTIN,  LTJG  Thomas  J. 

USCG  Headquarters  (G-PMR-5),  400  7th  St.,  SW. 

Washington,  DC  20590 
MASSEY,  CAPT  Randy  H. 

AFHRL/PES,  Brooks  AFB,  TX  78235 
Paper  presented:    "The  Content  Issue  in  Performance 

Appraisal  Ratings"  

MATHEWS,  John  J. 

AFHRL,  Brooks  AFB,  TX  78235 
Paper  presented:    "Prediction  of  Reading  Grade  Levels  of 

Service  Applicants  from  Armed  Services  Vocational 

Aptitude  Battery  (ASVAE)"   

nccN,  rdtriLid  m. 

USCG  Institute,  P.O.  Substation  18,  Oklahoma  City,  OK 

73169 

MEREDITH,  CDR  Carlton  F. 

USCG  AVTRACEN,  Mobile,  AL  36608   

MEREDITH,  Dr.  John  B.  Jr. 

Data  Design  Laboratories,  P.O.  Box  12773,  15  Koger, 

Norfolk,  VA  23502 
Paper  presented:    "A  Comparison  of  Two  Criterion-Referenced 

Scoring  Procedures  for  an  Answer-Until -Correct,  Multiple- 
Choice  Performance  Test"  

MERRILL,  M.  David 

Courseware,  Inc.  San  Diego,  CA 
Paper  presented:    "The  Instructional  Quality  Inventory: 

Introduction  and  Overview"  

MESSICK,  Vernon  D. 

NETPDC,  Ellyson,  Pensacola,  FL  32509 
MESSURA,  CW03  Ronald  A. 

USCG  Institute,  P.O.  Substation  18,  Oklahoma  City,  OK 

73169 
METTY,  CW04  Cleo  F. 

USCG  Institute,  P.O.  Substation  18,  Oklahoma  City,  OK 

73169 

MILLIGAN,  Dr.  John  R. 

US  Army  Research  Institute,  P.O.  Box  3066,  Ft.  Sill, 
Oklahoma  73503 

Paper  presented:    "A  Learning-Receptive  State  as  Induced  by 

an  Auditory  Signal  or  Frequency  Pulse"  

Paper  presented:    Observer  Self-Location  Ability  and  its 

Relationship  to  Cognitive  Orientation  Skills"   

MINTER,  CDR  Richard  W. 

USCG  Institute,  P.O.  Substation  18,  Oklahoma  City,  OK 

73169 

MITCHELL,  LT  COL  Jimriy  L. 

USAFOMC/ONY,  Stop  100,  Lackland  AFB,  TX  78236 
Paper  presented:    Differential  Responses  on  Alternately 
Anchored  Job  Rating  Scales"  .  .•   
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MOBLEY,  Amelia  E. 

USCG  400  7th  St.,  SW,  Washington,  DC  20590 
MOCHARNUK,  Dr.  John  B. 

McDonnell  Douglas  Astronautics  Co.  P.O.  Box  516, 

St.  Louis,  MO  63166 
Paper  presented:    "Methodology  for  Selection  and  Training  of 

Artillery  Forward  Observers  Job  Analysis"   

MONROE,  CW03  Larry  N. 

USCG  Institute,  P.O.  Substation  18,  Oklahoma  City,  OK 

73169 

MONTELMERLO,  Dr.  Melvin  D. 

ATTSC-IT-TD,  US  Army  Training  Support  Center, 

Ft.  Eustis,  VA  23604 
Paper  presented:    "Task  Analysis:    Destination  or  Journey". 
MULDROW,  Tressle  W. 

Resound  Research  &  Development  Center,  US  Civil  Service 

Commission,  1900  E  St.,  NW,  Washington,  DC  20415 
Paper  presented:    "The  Impact  of  Valid  Selection  Procedures 

on  Workforce  Productivity:  

MULLANE,  CAPT  Thomas  F. 

Service  School  Command,  Naval  Training  Center,  Orlando, 

Florida  32813 
MULLINS,  C.  J. 

AFHRL/PES,  Brooks  AFB,  TX  78235 
Paper  presented:    "The  Content  Issue  in  Performance 

Appraisal  Ratings"  

MURPHY,  John  W. 

USAIA  Test  Design  Coordinator,  Ft.  Benjamin  Harrison, 

Indiana  46218 
MUSSIA,  Stephen  J. 

|iiap,agsy<^  Persop.nsl  Research  &  Evcil:;  Minneapolis  Personnel 

Dept.,  312  3rd  Ave.  South,  Minneapolis,  MN  55415 
MYERS,  David  C. 

Advanced  Research  Resources  Organization,  4330  East  West 

Hwy.,  Bethesda,  MD  20014 
Paper  presented:    "Analysis  of  Heavy  Equipment  Operator 

Jobs"   

McCLINTOCK,  Dr.  William  R. 

PD-10  Navy  Education  &  Training  Program  Development 

Center,  Bldg.  922,  NETPCD,  Ellyson,  Pensacola,  Fl  32509 
McCOY,  Linda  A. 

USCG  Institute,  P.O.  Substation  18,  Oklahoma  City,  OK 

73169 

McDANIELS,  CW02  Donald  M. 

USCG  Institute,  P.O.  Substation  18,  Oklahoma  City,  OK 
73169 

Mcintosh,  virgii  m. 

AF  Extension  Course  Institute  (EDV),  Gunter  AFS,  AL  36118 
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McIVER,  LT  COL  Werner  W. 

Headquarters,  USMC,  Office  of  Manpower  Utilization, 
Quantico,  VA  22134 

Mckenzie,  Robert  c. 

Resound  Research  &  Development  Center,  US  Civil  Service 
Commission,  1900  E  St.,  NW,  Washington,  DC  20415 
Paper  presented:    "The  Impact  of  Valid  Selection  Procedures 
on  Workforce  Productivity". 
McLANATHAN,  COL  Frank  L. 

M^wflv^*^  Mary's  University,  San  Antonio,  TX  78284 
McVAY,  Kenneth  W. 

USA  Missile  &  Munitions  Center  &  School,  ATTN:  ATSK-TD-PM, 
Redstone  Arsenal,  AL  35809 
NEFF,  Edwin  F. 

Assistant  Chief  Training  &  Education,  US  Coast  Guard. 

400  7th  St.,  SW,  Washington,  DC  20590 
NELSON,  Oliver 

USAF/ATC  Randolph  AFB,  TX  78148 
NODDIN,  Ernest  M. 

Submarine  Medical  Research  Laboratory,  Box  900,  Submarine 
Base,  Groton,  CT  06340 
NOVAK,  Frank  J. 

Naval  Education  &  Training  Program  Development  Center, 
Classified  Instructional  Material  Division,  (PD-9) 
Bldg  942,  Ellyson,  Pensacola,  FL  32509 
NOWLIN,  Debra  L. 

^SCG^Institute,  P.O.  Substation  18,  Oklahoma  City,  OK 

NUGENT,  William  A. 

Navy  Per-sonnel  Research  &  Development  Center,  ATTN-  Code 
9309,  San  Diego,  CA  92152 
Paper  presented:    "Performance  Test  Objectivity:  Comparison 

n-mMM^. .  r''^*^''  Reliabilities  of  Three  Observation  Formats"  831 
0  CONNELL,  Joseph 

Police  Training,  M.L.E.O.T.C. ,  7426  North  Canal  Rd., 

Lansing,  MI  48913 
O'LEARY,  Dr.  Brian  S. 

U.S.  Civil  Service  Commission,  1900  E  Street,  NW, 

Washington,  DC  20415 
Paper  presented:    "Construct  Validity".  .  mq 
OLIVER,  Dr.  L.  W.  ^   

Army  Research  Institute,  5001  Eisenhower  Ave., 

Alexandria,  VA  22333 
Paper  presented:    "Differential  Field  Assignment  Patterns 

for  Male  and  Female  Soldiers"  ...  396 
OLIVO,  CAPT  John   

USAF  Occupation  Measurement  Center,  Lackland  AFB,  TX 
78236 

Paper  presented:    "The  Use  of  Job  Satisfaction  Data  in  the 

Occupational  Survey  Program".    65 
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OLSON,  Howard  C. 

Advanced  Research  Resources  Org. ,  4330  East  West  Hwy. , 

Bethesda,  MD  20014 
Paper  presented:    "Analysis  of  Heavy  Equipment  Operator 

Jobs"   

ORRISON*  CPT  Stephen  L. 

Director  of  Training  Development/USAIS,  ATTN:  ATSH-I-U- 

TDD,  Ft.  Banning,  OA  31905 
OSBORNE,  James  E. 

NAVEDTRA  PRODEVCEN  (PD-3),  Ellyson  AFB,  Pensacola,  PL 

32509 
OVERTON,  Deborah  J. 

USCG  Institute,  P.O.  Substation  18,  Oklahoma  City,  OK 

73169 

PACKER,  CW04  Harold  R. 

USCG  Institute,  P.O.  Substation  18,  Oklahoma  City,  OK 

73169 
PARKS,  LT  Alton  J. 

USCG  Institute,  P.O.  Substation  18,  Oklahoma  City,  OK 

73169 
PARSONS  •  Tom 

KENTRON  International,  Inc.,  P,0.  Box  35417,  Brooks  AFB, 

San  Antonio,  TX  78235 
PASS,  Dr.  John  J. 

Navy  Personnel  Research  &  Development  Center,  San  Diego, 

California  92152 
Paper  presented:    "Sample  Size  and  Stability  of  Task 

Analysis  Inventory  Response  Scales"   

PASTENE,  ENS  Charles  R. 

USCG  Institute,  P.O.  Substation  18,  Oklahoma  City,  OK 

73169 

PATTERSON,  CAPT  Gary  K. 

USAF  Sheppard  AFB,  Wichita  Falls,  TX  76308 

PEARLMEN,  Ksnneth 

US  Civil  Service  Commission,  1900  E  St.,  NW, 
Washington,  DC  20415 
Paper  presented:    "Test  of  a  New  Model  of  Validity  General- 
ization:   Results  for  Tests  Used  in  Clerical  Selection"  . 

PESKOE,  Stuart  E. 

Dynamics  Research  Corporation,  Wilminton,  MA 
Paper  presented:    "PAM:    A  Methodology  for  Predicting  Air 
Force  Personnel  Availability"   

PETERSON,  CWC3  Phillip  M. 

USCG,  10  Philanne  Dr.,  Norwich,  CT  06360 

PHALEN,  William  J. 

AFHRL/ORA,  Brooks  AFB,  TX  78235 
Paper  presented:    "The  Development  of  a  Technique  for 
Using  Occupational  Survey  Data  to  Construct  and  Weight 
Computer- Derived  Test  Outlines  for  Air  Force  Specialty 
Knowledge  Tests  (SKTs)"   
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PHILLIPS,  Fredric  F. 

Dynamics  Research  Corporation,  Wilmington,  MA 
Paper  presented:    "PAM:    A  Methodology  for  Predicting  Air 

Force  Personnel  Availability"  . 
PORTER,  George  V.  Jr. 

Director,  Cadet  Exams  &  Records,  USAF  Academy,  CO  80840 
POTTER,  LT  Earl  H.  Ill 

US  Coast  Guard  Academy,  New  London,  CT  06320 
PROVENMIRE,  H.  K. 

USCG  Aviation  Training  Center,  Bates  Field,  Mobile, 
Alabama  36608 
POWELL,  Ladonna  A. 

USCG  Inistitute,  P.O.  Substation  18,  Oklahoma  City,  OK 

73169  ^ 
PUZICHA,  Dr.  Klaus 

Regierunsdirektor  Bei ,  Dezernat  Wehrpsychologie  IM, 

Streitkrafteamt,  ABT.  I 
QUICK,  Bob  J. 

USCG  Institute,  P.O.  Substation  18,  Oklahoma  City,  OK 
73169 

RAMPTON,  LCOL  Glenn  M. 

Canadian  Forces  Personnel  Applied  Research  Unit,  4900 

Yonge  St.,  Willowdale,  Ontario 
Paper  presented:    "A  Strategy  for  Task  Analysis  and 

Criterion  Definition  Based  on  Multidimensional  Scaling"  . 
RAY,  MAJ  W.  D. 

Directorate  of  Evaluation,  USAMPS/TS,  Ft.  McClellan, 
Alabama  36205 
•RECKASt,  Dr.  Mark  D. 

University  of  Missouri,  4  Hill  Hall,  Columbia,  MO  65211 
Paper  presented:    "A  Generalization  of  Sequential  Analysis 
to  Decision  Making  with  Tailored  Testing"  .  .  . 
REINHARDT,  LCDR  William  H. 

US  Navy  Occupational  Development  &  Analysis  Center 
(NODAC),  Bldg  150,  Washington  Navy  Yard  (ANACOSTIA), 
Washington,  DC  20374 
RENEAU,  ENS  Lee 

USCG  Training  Center,  Cape  May,  NJ  08204 
RICHARDS,  Robert  E. 

The  Pennsylvania  State  University,  State  College,  PA 
Paper  presented:    "The  Instructional  Quality  Inventory: 
Introduction  and  Overview".  ... 
ROBERTS,  Fred  C. 

Naval  Health  Sciences,  Education  &  Training  Command, 
National  Naval  Med.  Center,  Bethesda,  MD  20014 
Paper  presented:    "Systematic  Instructional  Validation 
Through  Testing"  
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ROBERTSON,  D.  W. 

Navy  Personnel  Research  &  Development  Center,  San  Diego, 
CA  92152 

Paper  presented:    "Sample  Size  and  Stability  of  Task 

Analysis  Inventory  Response  Scales"    537 

ROID,  Gale 

Teaching  Research  Division,  Oregon  State  System  of  Higher 
Education,  Monmouth,  OR  97361 
Paper  presented:    "The  Emergence  of  an  Item-Writing 


Technology"   1035 

ROOT,  Robert  T. 

US  Army  Research  Institute,  5001  Eisenhower  Ave., 

Alexandria,  VA  22333 
Paper  presented:    An  application  of  Tactical  Engagement 

Simulation  for  Unit  Proficiency  Measurement"   1316 

RUBRIGHT,  Earl 

80th  MTC,  556  Valleywood  Dr.,  Millers  Ville,  MD 
Paper  presented:    "Evaluation  of  Intelligence  Producing 

Capability  of  Selected  Combat  Arms  Units"    1205 

RUCK,  Hendrick  W. 

AFHRL/OR,  Brooks  AFB,  TX  78235 
Paper  presented:    "The  Collection  and  Prediction  of  Training 

Emphasis  Ratings  for  Curriculum  Development"   242 

Paper  presented:    "Methods  for  Collecting  and  Analyzing 

Task  Analysis  Data"   314 

Paper  presented:    "Methods  for  Determining  Safety  Training 

Priorities  for  Job  Tasks"    296 

Paper  presented:    "Obstacles  to  and  Incentives  for  Stand- 
ardization of  Task  Analysis  Procedures"    188 

Paper  presented:    "A  Technique  for  Selecting  Electronic 

Specialties  for  Consolidation"   385 

RUMSEY,  M.  G. 


US  Arrriy  Research  Institute,  5001  Eisenhower  Dr.,  Alexandria, 


Virginia  22333 

Paper  presented:    "Development  of  the  Arrriy  ROTC  Management 
Simulation  Program  and  Instructors'  Orientation  Course"  .  1091 
RUX,  George  V. 

MEPCON,  MEPCT-P,  Bldg.  83,  Ft.  Sheridan  IL  60037 
Paper  presented:    "Development  of  a  Mobilization  Population 

Inventory  Using  Existing  ASVAB  Data  Banks"   645 

SANDS,  William  A. 

Navy  Personnel  Research  &  Development  Center  (Code  310), 
San  Diego,  CA  92152 
Paper  presented:    "Computer  Assisted  Reference  Locator 

(CARL)  System:    An  Overview"   470 
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SARGENT,  Mildred  L.  ,        .  ^  * 

Naval  Education  &  Training  Program  Development  Center, 

Ellyson  Field,  Pensacola,  FL  32509 
SCANLAND,  Dr.  Dorothy  ^  ^, 

US  Naval  Education  &  Training  Conmand  (Code  N-bj, 

Pensacola,  FL  32508 
SCANLAND,  Dr.  Worth  ^      ^  . 

US  Naval  Education  &  Training  Conmand  (Code  N-5J, 

Pensacola,  FL  32508 
SCHIEMANN,  William  A.  »   ,  „ 

Project  Manager,  AT&T,  Rm.  6126F2,  295  N.  Maple  Ave., 
Basking  Ridge,  NJ  07920 

SCHMIDT,  Frank  L.  ..o  ^-  •-.  c  • 

Resound  Research  &  Development  Center,  US  Civil  Service 
Corimission,  1900  E  St.,  NW,  Washington,  DC  20415 

Paper  presented:    "The  Impact  of  Valid  Selection  Procedures 
on  Workforce  Productivity"  •'  '  \' 

Paper  presented:    "Test  of  a  New  Model  of  Validity  General- 
ization:   Results  for  Tests  Used  in  Clerical  Selection  . 

SCHWARTZ,  CW04  John  E.  „      ,  u      r-*  ni. 

USCG  Institute,  P.O.  Substation  18,  Oklahoma  City,  OK 

73169 
SCOTT,  LT  Lynn  M. 

AFHRL/ORA,  Brooks  AFB,  TX  78235 
SEIBEL,  David  ,  ^ 

D.  E.  Siebel  &  Assoc.  LTD  (Canadian  Forces),  #1609  1275 

Richmond  Rd. ,  Ottawa,  Ontario  K2B  8E3 

SELLMAN,  MAJ  Wayne  S.  ,  ^   .      n   ^  i  u  acd 

Air  Force  Manpower  and  Personnel  Center,  Randolph  ai-b, 

Texas  78418  ^.     ^    .    ,     ,^  „f 

Paper  presented:    "Prediction  of  Reading  Grade  Levels  of 
Service  Applicants  from  Armed  Services  Vocational 

Aptitude  Battery  (ASVAB)"   

SEUBERLICH,  COl  Hans-Erich  .      .  r  « 

Chairman  Army  Section  DBwV  (Federal  Armed  Forces  Associa- 
tion), Sudstrabe  123,  5300  Bonn  2,  W.  Germany 
Paper  presented:    "Strain  by  Prolonged  Duty  Hours  and 
Problems  as  to  Mobility  of  Soldiers  -  As  Seen  by  Federal 
Armed  Forces  Association"   

SHIPLEY,  Brian  D.  Jr.  .     n  «   «  a-,c 

US  Army  Research  Institute  Field  Unit,  P.O.  Box  476, 

PERI-OA,  Ft.  Rucker,  AL  36362  „      n  4. 

Paper  presented:  "Complexity  of  Flight  Path  Data  as  an 
Index  of  Skill  in  Piloting  Performances  from  a  Flight 
Simulator  Based  Job-Sample  Test"  •  •  • 

Paper  presented:  "Learning  Aptitude,  Error  Tolerance,  and 
Achievement  Level  as  Factors  of  Performance  in  a  Visual - 
Tracking  Task"  
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SHIVELY,  Albert  E. 

Naval  Education  &  Training  Program  Development  Center, 

Ellyson,  Pensacola,  FL  32509 
SHOEN,  William  R. 

Service  School  Command,  Naval  Training  Center,  Orlando, 

FL  32813 
SILVERSTEIN,  Jerome  H. 

Defense  Language  Institute,  English  Language  Center, 

Lackland  AFB,  TX  78236 
SIMS,  Dr.  Bill 

Center  for  Naval  Analyses,  1401  Wilson  Blvd.  Arlington, 

Virginia  22209 
SKOFSTAD,  Dennis 

Aviation  Technical  Training  Center,  Elizabeth  City, 

North  Carolina  27909 
SMITH,  Dr.  Bea  H. 

Naval  Amphibious  School,  Coronado,  San  Diego,  CA  92155 
SMITH,  H.  Wayne 

Dept.  of  Psychology,  Texas  Tech  University,  Lubbock,  TX 

79409 

Paper  presented:    "An  Analysis  of  the  OE  Concept  and 

Suggested  Improvements"    942 

SOLOMON,  Elberta 

PD5,  Naval  Education  &  Training  Program  Development 

Center,  Ellyson,  Pensacola,  FL  32509 
SPRAGUE,  LT  Chester  M. 

USCG  Institute,  P.O.  Substation  18,  Oklahoma  City,  OK 

73169 

STAMM,  LTJG  James  A. 

USCG  Institute,  P.O.  Substation  18,  Oklahoma  City,  OK 
73169 

STEFFEN  Dale  A. 

Electronics  Division,  Denver  Research  Institute,  Univer- 
sity of  Denver,  P.O.  Box  10127,  Denver,  CO  80210 
Paper  presented:    "Evaluation  of  Troubleshooting  Simulator"  1249 

STEPHENSON,  Donald  P. 

Staff  &  Faculty  Division,  Office  of  DAC  for  EEL  Tech, 
USAARMS,  Ft.  Knox,  KY 

STEPHENSON,  Dr.  Robert  W. 

AF  Human  Resources  Laboratory,  Brooks  AFB,  TX  78235 
Paper  presented:    "Obstacles  to  and  Incentives  for  Stand- 
ardization of  Task  Analysis  Procedures"    188 

STERLING,  Martha  E. 

USCG  Institute,  P.O.  Substation  18,  Oklahoma  City,  OK 
73169 

STEWART,  RADM  W.  H. 

■  Chief,  Office  of  Personnel,  USCG,  400  7th  St.,  SW, 

Washington,  DC  20590 
Paper  presented  (Keynote  Address):    "Quality  of  Life"  ...  xi 
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STIMATZ,  LT  J.  Anthony 

Dept.  of  Mathematics,  USCG  Academy,  New  London,  CT  06320 
SVEJKOVSKY,  Mary  L. 

USCG  Institute,  P.O.  Substation  18,  Oklahoma  City,  OK 

73169 
TAKAHASHI,  Terry 

Defense  Intelligence  School,  Washington,  DC  20374 
TALLEY,  John  W. 

SWT  Division,  ATSK-TD-AD-A,  Bldg  3342,  USAMMCS,  Redstone 

Arsenal,  AL  35809 
TARTELL,  J.  S. 

Academy  of  Health  Sciences,  Ft.  Sam  Houston,  TX  78234 
Paper  presented:    "Job  Analysis  in  the  US  Army  Medical 

Training  Environment"   

TAYLOR,  Donald  F. 

CG  Research  &  Development,  2366  Antiqua  Ct.  Reston,  VA 

22091 

TAYLOR,  CAPT  Ronald  L. 

Extension  School,  Education  Center,  Marine  Corps 

Development  &  Education  Command,  Quantico,  VA  22134 
TEMPLEMAN,  Max 

Chief,  Education  Branch,  US  Army  Support  Command,  DPCA, 

USASCH,  Ft.  Shafter,  HI  96858 
THAIN,  John  W. 

Defense  Language  Institute,  Presidio  of  Monterey, 

Monterey,  CA  93940 
Paper  presented:    "Monte  Carlo  Computer  Programs  for 

Simulating  Selection  Decisions  from  Personnel  Tests".  .  . 
THEW,  Michael  C. 

AFHRL/SMAZ,  Brooks  AFB,  TX  78235 
Paper  presented:    "CODAP:    A  New  Modular  Approach  to 

Occupational  Analysis"  

THOMAS,  Patricia  J. 

Navy  Personnel  Research  and  Development  Center,  San  Diego, 

California  92152 
Paper  presented:    "The  Premature  Attrition  Rate  of  Navy 

Female  Enlistees"   

THOMPSON,  CDR  George  J. 

USCG  Institute,  P.O.  Substation  18,  Oklahoma  City,  OK 

73169 
THOMPSON,  Nancy 

AFHRL/OR,  Brooks  AFB,  San  Antonio,  TX  78229 
Paper  presented:    "The  Collection  and  Prediction  of  Training 

Emphasis  Ratings  for  Curriculum  Development"  

Paper  presented:    "Methods  for  Determining  Safety  Training 

Priorities  for  Job  Tasks"   
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THOMSON,  David  C. 

609  Sunhaven  Dr.,  San  Antonio,  TX  78239 
Paper  presented:    "Benchmark  Scales  for  Collecting  Task 

Training  Factor  Data"   

Paper  presented:    "The  Collection  and  Prediction  of  Training 

Emphasis  Ratings  for  Curriculum  Development"  

THURING,  LT  Allen  R. 

USCG  Reserve,  943  N.  Liberty  St.,  Arlington,  VA  22205 
TRAHNER,  Marvin  H. 

US  Civil  Service  Commission,  1900  E  Street,  NW, 
Washington,  DC  20415 
Chairman,  Symposium:    "Innovative  Test  Validation 

Strategies"   

Paper  presented:    "Synthetic  Validity"  

VALENTINE,  Dr.  Lonnie  D.  Jr. 

6205  Rue  Francois,  San  Antonio,  TX  78238 
Paper  presented:    "Air  Force  Experience  with  PROJECT 

TALENT"  

Paper  presented:    "Prediction  of  Reading  Grade  Levels  of 
Service  Applicants  from  Armed  Services  Vocational  Aptitude 

Battery  (ASVAB)"  

VAN  NOSTRAND,  SALLY  J. 

US  Army  Research  Institute,  5001  Eisenhower  Ave,, 
Alexandria,  VA  22333 
Paper  presented:    "Occupational  Analysis  for  Field  Grade 

Army  Officers"  

VAUGHAN,  CAPT  David  S. 

ATC  Technology  Applications  Center,  Lackland  AFB,  TX 

78236 

Paper  presented:    "Job  Performance  of  USAF  Bypassed 

Specialists"  

Paper  presenleJ:    "Two  Applications  of  Occupational  Survey 

Data  in  Making  Training  Decisions"  

VOORHEES,  Phyllis  L. 

USCG  Institute,  P.O.  Substation  18,  Oklahoma  City,  OK 

731269 

WALDKOETTER,  Dr.  Raymond  0. 

US  Army  Research  Institute,  P.O.  Box  3066,  Ft.  Sill, 
Oklahoma  73503 

Paper  presented:    "A  Learning-Receptive  State  as  Induced  by 
an  Auditory  Signal  or  Frequency  Pulse"  •  • 

Paper  presented:    "Observer  Self-Location  Ability  and  Its 
Relationship  to  Cognitive  Orientation  Skills"   

Paper  presented:    "Prediction  of  Field  Artillery  Officer 
Performance"  •  • 
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WALLIS,  M.  Reid 

Richard  A.  Gibboney  Associates,  Kensington,  MD 
Paper  presented:    "Occupation  Analysis  for  Field  Grade  Army 

Officers"   373 

WARM,  Thomas  A. 

USC6  Institute,  P.O.  Substation  18,  Oklahoma  City,  OK 

73169 

Paper  presented:  "A  Primer  of  Item  Response  Theory".  .  .  .  884 
WEBER,  CAPT  Elena  J. 

USAF  Occupational  Measurement  Center,  Lackland  AFB,  TX 
Paper  presented:    "The  Use  of  Job  Satisfaction  Data  in  the 

Occupational  Survey  Program"   65 

WEHR,  CDR  Robert  H. 

USCG  Aviation  Training  Center,  Bates  Field,  Mobile,  AL 
36608 

WEHRENBERG,  STC  Stephen  B. 

USCG  Reserve  Training  Center  (OGLAMS),  Yorktown,  VA  23690 
WEISSMULLER,  Johnny  J. 

AFHRL/SMAZ,  Brooks  AFB,  TX  78235 
Paper  presented:    "CODAP:    A  Mew  Modular  Approach  to  Occupa- 
tional Analysis"   ^62 

WELDON,  Dr.  John  I. 

US  Army  Training  and  Doctrine  Command,  Alexandria,  VA 
Paper  presented:    "Quality  of  ROTC  Accessions  to  the  Army 

Officer  Corps"   488 

WaOON,  Roland  L. 

Course  Development  Division,  US  Army  Aviation  Center, 

Ft.  Rucker,  AL  36362 
WELLINS,  Dr.  Richard  S. 

US  Army  Research  Institute,  5001  Eisenhower  Ave., 

Alexandria,  VA  22333 
Paper  presented:    "Development  of  the  Army  ROTC  Management 

Simulation  Program  and  Instructors'  Orientation  Course"  •  1091 
Paper  presented:    "Quality  of  ROTC  Accessions  to  the  Army 

Officer  Corps"   488 

WELSH,  CAPT  John  R.  Jr. 

3307  School  SQ. ,  Lackland  AFB,  TX  78236 
Paper  presented:    "Evaluation  of  the  MODIA  Planning 

System"   1335 

WERNER,  MAJ  Gerald  C. 

Dept.  of  Army  Individual  Training,  HQ  DA,  ATTN:  DAMO-TRI, 

The  Pentagon,  Washington,  DC  20310 
WEST,  Anita  S. 

Denver  Research  Institute,  University  of  Denver,  P.O.  Box 
10127,  Denver,  CO  80210 
Paper  presented:    "Evaluation  of  Troubleshooting  Simulator"  1249 
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WHITE,  Jonathan 

Police  Training,  M.L.E.O.T.C. ,  7426  N.  Canal  Rd., 

Lansing,  MI  48913 
WILCOVE,  Gerry  L. 

Navy  Personnel  Research  &  Development  Center,  San  Diego, 

California  92152 
Paper  presented:    "The  Premature  Attrition  of  Navy  Female 

Enlistees"   420 

WILLHOITE,  CAPT  Robert  R. 

The  National  Bank  of  Commerce,  Altus,  OK  73521 
WILLIAMS,  Rayburn  A. 

Chief  of  Naval  Education  &  Training  N-53,  Naval  Air 

Station,  Pensacola,  FL  32506 
WILLIAMSON,  Sharon  A. 

USCG  Institute,  P.O.  Substation  18,  Oklahoma  City,  OK 

73169 
WILLING,  Richard 

USCG  Institute,  P.O.  Substation  18,  Oklahoma  City,  OK 

73169 
WINKLER,  Edward  B. 

Human  Resources  Management  School,  Naval  Air  Station, 

Millington,  TN 
WINN,  Francis  J.  Jr. 

Medical  Unit,  USCG  Support  Center,  Governors  Island, 

New  York,  NY  10004 
WISKOFF,  Dr.  Martin 

Navy  Personnel  Research  &  Development  Center,  5151 

Dixel  Dr.,  San  Diego,  CA  92115 
WIHMAN,  LTC  Clarence  E. 

US  Anny  Directorate  of  Evaluation,  US  Army  Field 

Artillery  School,  Ft.  Sill,  OK  73503 
WOOD,  Norman  D. 

The  Pennsylvania  State  University,  State  College,  PA 
Paper  presented:    "The  Instructional  Quality  Inventory: 

Introduction  and  Overview"   1138 

WORD,  LTC  Larry  E. 

US  Army  Training  Support  Center,  Ft.  Eustis,  VA 
Paper  presented:    "An  Application  of  Tactical  Engagement 

Simulation  for  Unit  Proficiency  Measurement"   1316 

WORSTINE    Darrell  A. 

USA'Military  Personnel  Center,  DAPC-MSP-D,  200  Stovall 

St.,  Alexandria,  VA  22332 
Paper  presented:    "General  Overview  and  Initial  Findings  of 

the  Project  on  Job  Satisfaction  and  Retention  of  U.S. 

Anny  Enlisted  Personnel"   75 
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WULFECK,  Wallace  H.  II 

Navy  Personnel  Research  &  Development  Center,  Code  304, 

San  Diego,  CA  92152 
Paper  presented:    "The  Instructional  Quality  Inventory: 

Introduction  and  Overview"  

YOUNG,  LT  Larry  C. 

USCG  Institute,  P.O.  Substation  18,  Oklahoma  City,  OK 

73169 
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BLAND,    CDR  R.  D. 

USCG  Training  Center,  Cape  May,  N.J.  08204 
CARLSON,  Robert  R. 

Coirmandant  (G-PTE),  U.S.  Coast  Guard,  Washington,  D.C.  20590 
CHIPPENDALE,  Joan 

TRACEN  Governors  Island,  NY  10004 
CRUICKSHANK,  James  G. 

Coirmandant  (G-PTE),  U.S.  Coast  Guard,  Washington,  D.C.  20590 
DONOHOE,  CAPT  L.V. 

USCG  Training  Center,  Cape  May,  N.J.  08204 
GARCIA,  LT  Rebecca  M. 

Comnander  (r),  11th  Coast  Guard  District,  400  Oceangate, 

Long  Beach,  CA  90822 
GREENFIELD,  LCDR  J.  T. 

Coirmander  (r),  11th  Coast  Guard  District,  400  Oceangate, 

Long  Beach,  CA  90822 
JOYCE s  LCDR  E,  P. 

U.S.  Coast  Guard  Academy,  New  London,  CT  06320 
PALESE,  Robert 

TRACEN  Governors  Island,  NY  10004 
SANOK,  CDR  Gregory  J. 

USCG  TRACEN,  Gov't  Island,  Alameda,  CA  94501 
THRALL,  CAPT  F.  E. 

TRACEN  Governor's  Island,  NY  10004 
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